Deploy with Docker and Kubernetes

ChatCLI can be packaged as a Docker container and deployed on Kubernetes using the official Helm chart. This page covers all deployment scenarios.

Official Images (GHCR)

Official Docker images are automatically published to the GitHub Container Registry with each release:

ChatCLI Server

Latest version: 1.139.0
ghcr.io/diillson/chatcli:1.139.0

Kubernetes Operator

Latest version: 1.139.0
ghcr.io/diillson/chatcli-operator:1.139.0

# Pull the server image (pinned version — recommended)
docker pull ghcr.io/diillson/chatcli:1.139.0

# Or the latest available
docker pull ghcr.io/diillson/chatcli:latest

# Pull the operator image
docker pull ghcr.io/diillson/chatcli-operator:1.139.0

The images support multi-arch (linux/amd64 and linux/arm64).

Docker

Building the Image (Local)

# From the project root
docker build -t chatcli .

The Dockerfile uses a multi-stage build to produce a minimal image (~20MB):

Build stage: golang:1.25-alpine compiles the binary
Runtime stage: alpine:3.21 with non-root user and built-in health check

Building the Operator Image (Local)

# IMPORTANT: must be built from the repository root
# (the operator's go.mod uses a replace directive pointing to ../)
docker build -f operator/Dockerfile -t ghcr.io/diillson/chatcli-operator:latest .

The operator Dockerfile uses:

Build stage: golang:1.25 with multi-arch support (TARGETARCH)
Runtime stage: gcr.io/distroless/static:nonroot (maximum security, no shell)

Running with Docker

Basic
With Auth
With Persistence

docker run -p 50051:50051 \
  -e LLM_PROVIDER=OPENAI \
  -e OPENAI_API_KEY=sk-xxx \
  chatcli

docker run -p 50051:50051 \
  -e CHATCLI_SERVER_TOKEN=my-token \
  -e LLM_PROVIDER=CLAUDEAI \
  -e ANTHROPIC_API_KEY=sk-ant-xxx \
  chatcli

docker run -p 50051:50051 \
  -v chatcli-sessions:/home/chatcli/.chatcli/sessions \
  -e LLM_PROVIDER=OPENAI \
  -e OPENAI_API_KEY=sk-xxx \
  chatcli

Docker Compose

The project includes a docker-compose.yml ready for development:

Set the variables

export LLM_PROVIDER=OPENAI
export OPENAI_API_KEY=sk-xxx

Start the container

docker compose up -d

Connect from your terminal

chatcli connect localhost:50051

Docker Compose configures:

Port 50051 exposed
Persistent volumes for sessions and plugins
Automatic restart (unless-stopped)
All LLM variables via environment
Security hardening: read-only filesystem, no-new-privileges, CPU/memory limits, tmpfs for /tmp

`docker-compose.yml` File

version: "3.9"

services:
  chatcli-server:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: chatcli-server
    ports:
      - "50051:50051"
    environment:
      CHATCLI_SERVER_PORT: "50051"
      CHATCLI_SERVER_TOKEN: "${CHATCLI_SERVER_TOKEN:-}"
      LLM_PROVIDER: "${LLM_PROVIDER:-}"
      OPENAI_API_KEY: "${OPENAI_API_KEY:-}"
      ANTHROPIC_API_KEY: "${ANTHROPIC_API_KEY:-}"
      GOOGLEAI_API_KEY: "${GOOGLEAI_API_KEY:-}"
      OPENROUTER_API_KEY: "${OPENROUTER_API_KEY:-}"
      OLLAMA_ENABLED: "${OLLAMA_ENABLED:-}"
      OLLAMA_BASE_URL: "${OLLAMA_BASE_URL:-}"
      GITHUB_COPILOT_TOKEN: "${GITHUB_COPILOT_TOKEN:-}"
      COPILOT_MODEL: "${COPILOT_MODEL:-}"
      LOG_LEVEL: "${LOG_LEVEL:-info}"
    volumes:
      - chatcli-sessions:/home/chatcli/.chatcli/sessions
      - chatcli-plugins:/home/chatcli/.chatcli/plugins
    restart: unless-stopped
    read_only: true
    tmpfs:
      - /tmp:size=100M
    security_opt:
      - no-new-privileges:true
    deploy:
      resources:
        limits:
          cpus: "2.0"
          memory: 1G

volumes:
  chatcli-sessions:
  chatcli-plugins:

The container runs with a read-only filesystem and no-new-privileges by default. The /tmp directory uses an in-memory tmpfs (limited to 100MB). The named volumes (chatcli-sessions, chatcli-plugins) are the only writable mount points. See the security documentation for details.

Kubernetes (Helm)

ChatCLI Helm charts are available as OCI artifacts on GHCR — no need to clone the repository.

Prerequisites

Kubernetes cluster (kind, minikube, EKS, GKE, AKS, etc.)
Helm 3.8+ installed (OCI support)
kubectl configured for the cluster

Basic Installation

OpenAI
Anthropic (with Auth)

helm install chatcli oci://ghcr.io/diillson/charts/chatcli \
  --set llm.provider=OPENAI \
  --set secrets.openaiApiKey=sk-xxx

helm install chatcli oci://ghcr.io/diillson/charts/chatcli \
  --set llm.provider=CLAUDEAI \
  --set secrets.anthropicApiKey=sk-ant-xxx \
  --set server.token=my-secret-token

Installation with Security (Helm)

For deployments with full security, including rate limiting, JWT authentication, and secure agent mode:

helm install chatcli oci://ghcr.io/diillson/charts/chatcli \
  --set security.rateLimitRps=20 \
  --set security.agentSecurityMode=strict \
  --set security.jwtSecretRef.name=chatcli-jwt \
  --set security.jwtSecretRef.key=secret

Installation with K8s Watcher (Single-Target)

helm install chatcli oci://ghcr.io/diillson/charts/chatcli \
  --set llm.provider=OPENAI \
  --set secrets.openaiApiKey=sk-xxx \
  --set watcher.enabled=true \
  --set watcher.deployment=myapp \
  --set watcher.namespace=production

Installation with Multi-Target + Prometheus

To monitor multiple deployments with Prometheus metrics, use a values.yaml:

# values-multi.yaml
llm:
  provider: CLAUDEAI
secrets:
  anthropicApiKey: sk-ant-xxx
watcher:
  enabled: true
  interval: "15s"
  maxContextChars: 32000
  targets:
    - deployment: api-gateway
      namespace: production
      metricsPort: 9090
      metricsFilter: ["http_requests_*", "http_request_duration_*"]
    - deployment: auth-service
      namespace: production
      metricsPort: 9090
    - deployment: worker
      namespace: batch

helm install chatcli oci://ghcr.io/diillson/charts/chatcli -f values-multi.yaml

The chart automatically:

Creates a ServiceAccount with RBAC for the watcher to read pods, events, and logs
Auto-detects multi-namespace: if targets are in different namespaces, uses ClusterRole instead of Role
Generates a ConfigMap <name>-watch-config with the multi-target YAML
Mounts the config as a volume and passes --watch-config to the container
Properly passes --token, --model, and --mcp-config flags to the server
Uses native gRPC health probes (liveness, readiness, and startup) instead of pidof
Includes all 17 operator CRDs in the crds/ directory

Helm Chart Values

Server

Value	Description	Default
`replicaCount`	Number of replicas	`1`
`image.repository`	Image repository	`ghcr.io/diillson/chatcli`
`image.tag`	Image tag	`latest`
`server.port`	gRPC port	`50051`
`server.metricsPort`	HTTP port for Prometheus metrics (0 = disabled)	`9090`
`server.token`	Authentication token	`""`
`server.grpcReflection`	Enable gRPC reflection (debugging)	`false`
`serviceMonitor.enabled`	Create ServiceMonitor (requires Prometheus Operator)	`false`
`serviceMonitor.interval`	Prometheus scrape interval	`30s`

TLS

Value	Description	Default
`tls.enabled`	Enable TLS	`false`
`tls.certFile`	Certificate path	`""`
`tls.keyFile`	Key path	`""`
`tls.existingSecret`	Existing Secret with certs	`""`

LLM

Value	Description	Default
`llm.provider`	Default provider	`""`
`llm.model`	Default model	`""`

Secrets (API Keys)

Value	Description
`secrets.existingSecret`	Existing Secret (instead of creating a new one)
`secrets.openaiApiKey`	OpenAI key
`secrets.anthropicApiKey`	Anthropic key
`secrets.googleaiApiKey`	Google AI key
`secrets.xaiApiKey`	xAI key
`secrets.stackspotClientId`	StackSpot Client ID
`secrets.stackspotClientKey`	StackSpot Client Key
`secrets.stackspotRealm`	StackSpot Realm
`secrets.stackspotAgentId`	StackSpot Agent ID
`secrets.openrouterApiKey`	OpenRouter API key
`secrets.githubCopilotToken`	GitHub Copilot OAuth token

GitHub Copilot

Value	Description	Default
`COPILOT_MODEL`	Default Copilot model (e.g., `gpt-4o`, `claude-sonnet-4`)	`gpt-4o`
`COPILOT_MAX_TOKENS`	Maximum tokens for response	`""`
`COPILOT_API_BASE_URL`	API base URL (for enterprise environments)	`https://api.githubcopilot.com`

For authentication, use secrets.githubCopilotToken with a token obtained via /auth login github-copilot, or set GITHUB_COPILOT_TOKEN as an environment variable.

Ollama

Value	Description	Default
`ollama.enabled`	Enable Ollama	`false`
`ollama.baseUrl`	Ollama base URL	`http://ollama:11434`
`ollama.model`	Ollama model	`""`

K8s Watcher

Value	Description	Default
`watcher.enabled`	Enable the watcher	`false`
`watcher.targets`	Multi-deployment target list (see below)	`[]`
`watcher.deployment`	Single deployment - legacy	`""`
`watcher.namespace`	Deployment namespace - legacy	`""`
`watcher.interval`	Collection interval	`30s`
`watcher.window`	Observation window	`2h`
`watcher.maxLogLines`	Log lines per pod	`100`
`watcher.maxContextChars`	LLM context budget	`32000`

Fields for each target (watcher.targets[].):

Field	Description	Required
`deployment`	Deployment name	Yes
`namespace`	Namespace (default: `default`)	No
`metricsPort`	Prometheus port (0 = disabled)	No
`metricsPath`	HTTP metrics path	No (`/metrics`)
`metricsFilter`	Glob filters for metrics	No

Provider Fallback

Value	Description	Default
`fallback.enabled`	Enable automatic failover chain	`false`
`fallback.providers`	Ordered list of providers `[{name, model}]`	`[]`
`fallback.maxRetries`	Retries per provider before advancing	`2`
`fallback.cooldownBase`	Base cooldown after failure	`30s`
`fallback.cooldownMax`	Maximum cooldown (exponential backoff)	`5m`

MCP (Model Context Protocol)

Value	Description	Default
`mcp.enabled`	Enable MCP integration	`false`
`mcp.servers`	List of MCP servers `[{name, transport, command, args, url, enabled}]`	`[]`
`mcp.existingConfigMap`	Existing ConfigMap with `mcp_servers.json`	`""`

Bootstrap and Memory

Value	Description	Default
`bootstrap.enabled`	Load bootstrap files (SOUL.md, USER.md, etc.)	`false`
`bootstrap.definitions`	Inline bootstrap file definitions	`{}`
`bootstrap.existingConfigMap`	Existing ConfigMap with bootstrap files	`""`
`memory.enabled`	Enable persistent memory	`false`
`safety.enabled`	Enable configurable safety rules	`false`

Skill Registry

Value	Description	Default
`skillRegistry.enabled`	Enable environment variables for skill registry	`false`
`skillRegistry.registryUrls`	Additional registry URLs (comma-separated)	`""`
`skillRegistry.registryDisable`	Registry names to disable (comma-separated)	`""`
`skillRegistry.installDir`	Skill installation directory inside the container	`""`

When enabled, the values are passed as CHATCLI_REGISTRY_* environment variables in the ConfigMap. The ChatCLI container automatically creates ~/.chatcli/registries.yaml with the default registries (chatcli, clawhub). Use /skill search and /skill install to manage skills via registries.

Persistence

Value	Description	Default
`persistence.enabled`	Persist sessions in PVC	`true`
`persistence.storageClass`	Storage class	`""`
`persistence.size`	Volume size	`1Gi`

Security

Value	Description	Default
`podSecurityContext.runAsNonRoot`	Enforce non-root execution	`true`
`podSecurityContext.runAsUser`	Process UID	`1000`
`podSecurityContext.seccompProfile.type`	Seccomp profile	`RuntimeDefault`
`securityContext.allowPrivilegeEscalation`	Allow privilege escalation	`false`
`securityContext.readOnlyRootFilesystem`	Read-only filesystem	`true`
`securityContext.capabilities.drop`	Dropped capabilities	`ALL`
`rbac.clusterWide`	Use ClusterRole instead of namespace-scoped Role	`false`

When readOnlyRootFilesystem is true, the chart automatically mounts a tmpfs at /tmp and an emptyDir at /home/chatcli/.chatcli (200Mi) for runtime data. The HOME=/home/chatcli variable is set automatically. To monitor multiple namespaces, enable rbac.clusterWide: true. See the security documentation for details. Note: The ConfigMap and Secret referenced via envFrom are marked as optional: true, allowing you to create the Instance/Deployment before the dependent resources. The operator watches Secrets automatically and triggers rolling updates when they are created or updated.

Autoscaling (HPA)

Value	Description	Default
`autoscaling.enabled`	Enable HorizontalPodAutoscaler	`false`
`autoscaling.minReplicas`	Minimum replicas	`1`
`autoscaling.maxReplicas`	Maximum replicas	`10`
`autoscaling.targetCPUUtilizationPercentage`	Target CPU utilization (%)	`80`
`autoscaling.targetMemoryUtilizationPercentage`	Target memory utilization (%)	`""`

When autoscaling.enabled is true, replicaCount is ignored and the HPA controls the number of replicas automatically.

Pod Disruption Budget

Value	Description	Default
`podDisruptionBudget.enabled`	Create PodDisruptionBudget	`false`
`podDisruptionBudget.minAvailable`	Minimum pods available during disruptions	`1`
`podDisruptionBudget.maxUnavailable`	Maximum unavailable pods (alternative to minAvailable)	`""`

The PDB ensures high availability during node upgrades, drains, and cluster maintenance.

Network Policy

Value	Description	Default
`networkPolicy.enabled`	Create NetworkPolicy	`false`
`networkPolicy.allowIngressFrom`	Allowed ingress rules	`[]`
`networkPolicy.allowEgressTo`	Allowed egress rules	`[]`

NetworkPolicy restricts network traffic at the pod level. Requires a CNI with NetworkPolicy support (Calico, Cilium, etc.).

Networking

Value	Description	Default
`service.type`	Service type	`ClusterIP`
`service.port`	Service port	`50051`
`service.headless`	Enable headless Service for gRPC client-side load balancing (recommended when `replicaCount > 1`)	`false`
`ingress.enabled`	Enable Ingress	`false`

gRPC and multiple replicas: gRPC uses persistent HTTP/2 connections that pin to a single pod. For replicaCount > 1, enable service.headless: true to activate round-robin load balancing via DNS. The client already has built-in keepalive and round-robin support. Ingress gRPC: When Ingress is enabled with className: nginx, the chart automatically adds the nginx.ingress.kubernetes.io/backend-protocol: "GRPC" annotation to route gRPC traffic correctly.

Using an Existing Secret

If you already have a Secret with the API keys:

helm install chatcli oci://ghcr.io/diillson/charts/chatcli \
  --set llm.provider=OPENAI \
  --set secrets.existingSecret=my-llm-keys

The Secret must contain the expected keys:

apiVersion: v1
kind: Secret
metadata:
  name: my-llm-keys
type: Opaque
stringData:
  OPENAI_API_KEY: "sk-xxx"
  ANTHROPIC_API_KEY: "sk-ant-xxx"
  OPENROUTER_API_KEY: "sk-or-xxx"  # optional
  GITHUB_COPILOT_TOKEN: "ghu_xxx"  # optional

Accessing the Server

Port Forward (Dev)
NodePort
LoadBalancer

kubectl port-forward svc/chatcli 50051:50051
chatcli connect localhost:50051

helm install chatcli oci://ghcr.io/diillson/charts/chatcli \
  --set service.type=NodePort
chatcli connect <node-ip>:<node-port>

helm install chatcli oci://ghcr.io/diillson/charts/chatcli \
  --set service.type=LoadBalancer

# Wait for the external IP
kubectl get svc chatcli -w
chatcli connect <external-ip>:50051

Ingress (with TLS)

# values-prod.yaml
ingress:
  enabled: true
  className: nginx
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
  hosts:
    - host: chatcli.mydomain.com
      paths:
        - path: /
          pathType: ImplementationSpecific
  tls:
    - secretName: chatcli-tls
      hosts:
        - chatcli.mydomain.com

helm install chatcli oci://ghcr.io/diillson/charts/chatcli -f values-prod.yaml

Upgrade and Rollback

# Upgrade
helm upgrade chatcli oci://ghcr.io/diillson/charts/chatcli --set llm.model=gpt-4-turbo

# Rollback
helm rollback chatcli 1

Security Configuration

The Helm chart supports advanced security configuration for production environments:

Value	Description	Default
`security.rateLimitRps`	Requests per second limit (rate limiting)	`0` (disabled)
`security.bindAddress`	Server bind address. Auto-detects `0.0.0.0` in Kubernetes via `KUBERNETES_SERVICE_HOST`.	`127.0.0.1` / `0.0.0.0` (K8s)
`security.agentSecurityMode`	Agent security mode (`strict` or `permissive`)	`strict`
`security.jwtSecretRef.name`	Name of the Kubernetes Secret containing the JWT secret	`""`
`security.jwtSecretRef.key`	Key within the Secret holding the JWT secret value	`""`
`security.auditLog`	Enable security audit logging	`false`
`security.sessionEncryption`	Enable session encryption at rest	`false`

# values-security.yaml
security:
  rateLimitRps: 20
  # bindAddress: "0.0.0.0"  # Optional — auto-detected in Kubernetes
  agentSecurityMode: strict
  auditLog: true
  sessionEncryption: true
  jwtSecretRef:
    name: chatcli-jwt
    key: secret

In Kubernetes, bindAddress is automatically detected as 0.0.0.0 via the KUBERNETES_SERVICE_HOST environment variable. No manual configuration is needed.

In production, always configure security.jwtSecretRef to enable JWT authentication. Without it, the server accepts unauthenticated connections.

Full Example: Production

Single-Target (Legacy)

helm install chatcli oci://ghcr.io/diillson/charts/chatcli \
  --namespace chatcli --create-namespace \
  --set llm.provider=CLAUDEAI \
  --set secrets.anthropicApiKey=sk-ant-xxx \
  --set server.token=super-secret-token \
  --set tls.enabled=true \
  --set tls.existingSecret=chatcli-tls-certs \
  --set watcher.enabled=true \
  --set watcher.deployment=production-app \
  --set watcher.namespace=production \
  --set persistence.enabled=true \
  --set persistence.size=5Gi \
  --set resources.requests.memory=256Mi \
  --set resources.limits.memory=1Gi

Multi-Target with Prometheus (Recommended)

# values-prod.yaml
llm:
  provider: CLAUDEAI
secrets:
  existingSecret: chatcli-llm-keys
server:
  token: super-secret-token
tls:
  enabled: true
  existingSecret: chatcli-tls-certs
watcher:
  enabled: true
  interval: "15s"
  maxContextChars: 10000
  targets:
    - deployment: api-gateway
      namespace: production
      metricsPort: 9090
      metricsFilter: ["http_requests_*", "http_request_duration_*"]
    - deployment: auth-service
      namespace: production
      metricsPort: 9090
    - deployment: payment-service
      namespace: production
      metricsPort: 9090
      metricsFilter: ["payment_*", "stripe_*"]
    - deployment: worker
      namespace: batch
persistence:
  enabled: true
  size: 5Gi
resources:
  requests:
    memory: 256Mi
  limits:
    memory: 1Gi

helm install chatcli oci://ghcr.io/diillson/charts/chatcli \
  --namespace chatcli --create-namespace \
  -f values-prod.yaml

When targets are in different namespaces (e.g., production and batch), the chart automatically creates a ClusterRole instead of a namespace-scoped Role.

Next Steps

Server

Configure the gRPC server

Remote Connection

Connect to the server

K8s Watcher

Monitor Kubernetes

​Official Images (GHCR)

ChatCLI Server

Kubernetes Operator

​Docker

​Building the Image (Local)

​Building the Operator Image (Local)

​Running with Docker

​Docker Compose

​docker-compose.yml File

​Kubernetes (Helm)

​Prerequisites

​Basic Installation

​Installation with Security (Helm)

​Installation with K8s Watcher (Single-Target)

​Installation with Multi-Target + Prometheus

​Helm Chart Values

​Server

​TLS

​LLM

​Secrets (API Keys)

​GitHub Copilot

​Ollama

​K8s Watcher

​Provider Fallback

​MCP (Model Context Protocol)

​Bootstrap and Memory

​Skill Registry

​Persistence

​Security

​Autoscaling (HPA)

​Pod Disruption Budget

​Network Policy

​Networking

​Using an Existing Secret

​Accessing the Server

​Ingress (with TLS)

​Upgrade and Rollback

​Security Configuration

​Full Example: Production

​Single-Target (Legacy)

​Multi-Target with Prometheus (Recommended)

​Next Steps

Server

Remote Connection

K8s Watcher

Official Images (GHCR)

Docker

Building the Image (Local)

Building the Operator Image (Local)

Running with Docker

Docker Compose

`docker-compose.yml` File

Kubernetes (Helm)

Prerequisites

Basic Installation

Installation with Security (Helm)

Installation with K8s Watcher (Single-Target)

Installation with Multi-Target + Prometheus

Helm Chart Values

Server

TLS

LLM

Secrets (API Keys)

GitHub Copilot

Ollama

K8s Watcher

Provider Fallback

MCP (Model Context Protocol)

Bootstrap and Memory

Skill Registry

Persistence

Security

Autoscaling (HPA)

Pod Disruption Budget

Network Policy

Networking

Using an Existing Secret

Accessing the Server

Ingress (with TLS)

Upgrade and Rollback

Security Configuration

Full Example: Production

Single-Target (Legacy)

Multi-Target with Prometheus (Recommended)

Next Steps