Skip to main content
ChatCLI can be packaged as a Docker container and deployed on Kubernetes using the official Helm chart. This page covers all deployment scenarios.

Official Images (GHCR)

Official Docker images are automatically published to the GitHub Container Registry with each release:

ChatCLI Server

Latest version: 1.139.0
ghcr.io/diillson/chatcli:1.139.0

Kubernetes Operator

Latest version: 1.139.0
ghcr.io/diillson/chatcli-operator:1.139.0
# Pull the server image (pinned version — recommended)
docker pull ghcr.io/diillson/chatcli:1.139.0

# Or the latest available
docker pull ghcr.io/diillson/chatcli:latest

# Pull the operator image
docker pull ghcr.io/diillson/chatcli-operator:1.139.0
The images support multi-arch (linux/amd64 and linux/arm64).

Docker

Building the Image (Local)

# From the project root
docker build -t chatcli .
The Dockerfile uses a multi-stage build to produce a minimal image (~20MB):
  • Build stage: golang:1.25-alpine compiles the binary
  • Runtime stage: alpine:3.21 with non-root user and built-in health check

Building the Operator Image (Local)

# IMPORTANT: must be built from the repository root
# (the operator's go.mod uses a replace directive pointing to ../)
docker build -f operator/Dockerfile -t ghcr.io/diillson/chatcli-operator:latest .
The operator Dockerfile uses:
  • Build stage: golang:1.25 with multi-arch support (TARGETARCH)
  • Runtime stage: gcr.io/distroless/static:nonroot (maximum security, no shell)

Running with Docker

docker run -p 50051:50051 \
  -e LLM_PROVIDER=OPENAI \
  -e OPENAI_API_KEY=sk-xxx \
  chatcli

Docker Compose

The project includes a docker-compose.yml ready for development:
1

Set the variables

export LLM_PROVIDER=OPENAI
export OPENAI_API_KEY=sk-xxx
2

Start the container

docker compose up -d
3

Connect from your terminal

chatcli connect localhost:50051
Docker Compose configures:
  • Port 50051 exposed
  • Persistent volumes for sessions and plugins
  • Automatic restart (unless-stopped)
  • All LLM variables via environment
  • Security hardening: read-only filesystem, no-new-privileges, CPU/memory limits, tmpfs for /tmp

docker-compose.yml File

version: "3.9"

services:
  chatcli-server:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: chatcli-server
    ports:
      - "50051:50051"
    environment:
      CHATCLI_SERVER_PORT: "50051"
      CHATCLI_SERVER_TOKEN: "${CHATCLI_SERVER_TOKEN:-}"
      LLM_PROVIDER: "${LLM_PROVIDER:-}"
      OPENAI_API_KEY: "${OPENAI_API_KEY:-}"
      ANTHROPIC_API_KEY: "${ANTHROPIC_API_KEY:-}"
      GOOGLEAI_API_KEY: "${GOOGLEAI_API_KEY:-}"
      OPENROUTER_API_KEY: "${OPENROUTER_API_KEY:-}"
      OLLAMA_ENABLED: "${OLLAMA_ENABLED:-}"
      OLLAMA_BASE_URL: "${OLLAMA_BASE_URL:-}"
      GITHUB_COPILOT_TOKEN: "${GITHUB_COPILOT_TOKEN:-}"
      COPILOT_MODEL: "${COPILOT_MODEL:-}"
      LOG_LEVEL: "${LOG_LEVEL:-info}"
    volumes:
      - chatcli-sessions:/home/chatcli/.chatcli/sessions
      - chatcli-plugins:/home/chatcli/.chatcli/plugins
    restart: unless-stopped
    read_only: true
    tmpfs:
      - /tmp:size=100M
    security_opt:
      - no-new-privileges:true
    deploy:
      resources:
        limits:
          cpus: "2.0"
          memory: 1G

volumes:
  chatcli-sessions:
  chatcli-plugins:
The container runs with a read-only filesystem and no-new-privileges by default. The /tmp directory uses an in-memory tmpfs (limited to 100MB). The named volumes (chatcli-sessions, chatcli-plugins) are the only writable mount points. See the security documentation for details.

Kubernetes (Helm)

ChatCLI Helm charts are available as OCI artifacts on GHCR — no need to clone the repository.

Prerequisites

  • Kubernetes cluster (kind, minikube, EKS, GKE, AKS, etc.)
  • Helm 3.8+ installed (OCI support)
  • kubectl configured for the cluster

Basic Installation

helm install chatcli oci://ghcr.io/diillson/charts/chatcli \
  --set llm.provider=OPENAI \
  --set secrets.openaiApiKey=sk-xxx

Installation with Security (Helm)

For deployments with full security, including rate limiting, JWT authentication, and secure agent mode:
helm install chatcli oci://ghcr.io/diillson/charts/chatcli \
  --set security.rateLimitRps=20 \
  --set security.agentSecurityMode=strict \
  --set security.jwtSecretRef.name=chatcli-jwt \
  --set security.jwtSecretRef.key=secret

Installation with K8s Watcher (Single-Target)

helm install chatcli oci://ghcr.io/diillson/charts/chatcli \
  --set llm.provider=OPENAI \
  --set secrets.openaiApiKey=sk-xxx \
  --set watcher.enabled=true \
  --set watcher.deployment=myapp \
  --set watcher.namespace=production

Installation with Multi-Target + Prometheus

To monitor multiple deployments with Prometheus metrics, use a values.yaml:
# values-multi.yaml
llm:
  provider: CLAUDEAI
secrets:
  anthropicApiKey: sk-ant-xxx
watcher:
  enabled: true
  interval: "15s"
  maxContextChars: 32000
  targets:
    - deployment: api-gateway
      namespace: production
      metricsPort: 9090
      metricsFilter: ["http_requests_*", "http_request_duration_*"]
    - deployment: auth-service
      namespace: production
      metricsPort: 9090
    - deployment: worker
      namespace: batch
helm install chatcli oci://ghcr.io/diillson/charts/chatcli -f values-multi.yaml
The chart automatically:
  • Creates a ServiceAccount with RBAC for the watcher to read pods, events, and logs
  • Auto-detects multi-namespace: if targets are in different namespaces, uses ClusterRole instead of Role
  • Generates a ConfigMap <name>-watch-config with the multi-target YAML
  • Mounts the config as a volume and passes --watch-config to the container
  • Properly passes --token, --model, and --mcp-config flags to the server
  • Uses native gRPC health probes (liveness, readiness, and startup) instead of pidof
  • Includes all 17 operator CRDs in the crds/ directory

Helm Chart Values

Server

ValueDescriptionDefault
replicaCountNumber of replicas1
image.repositoryImage repositoryghcr.io/diillson/chatcli
image.tagImage taglatest
server.portgRPC port50051
server.metricsPortHTTP port for Prometheus metrics (0 = disabled)9090
server.tokenAuthentication token""
server.grpcReflectionEnable gRPC reflection (debugging)false
serviceMonitor.enabledCreate ServiceMonitor (requires Prometheus Operator)false
serviceMonitor.intervalPrometheus scrape interval30s

TLS

ValueDescriptionDefault
tls.enabledEnable TLSfalse
tls.certFileCertificate path""
tls.keyFileKey path""
tls.existingSecretExisting Secret with certs""

LLM

ValueDescriptionDefault
llm.providerDefault provider""
llm.modelDefault model""

Secrets (API Keys)

ValueDescription
secrets.existingSecretExisting Secret (instead of creating a new one)
secrets.openaiApiKeyOpenAI key
secrets.anthropicApiKeyAnthropic key
secrets.googleaiApiKeyGoogle AI key
secrets.xaiApiKeyxAI key
secrets.stackspotClientIdStackSpot Client ID
secrets.stackspotClientKeyStackSpot Client Key
secrets.stackspotRealmStackSpot Realm
secrets.stackspotAgentIdStackSpot Agent ID
secrets.openrouterApiKeyOpenRouter API key
secrets.githubCopilotTokenGitHub Copilot OAuth token

GitHub Copilot

ValueDescriptionDefault
COPILOT_MODELDefault Copilot model (e.g., gpt-4o, claude-sonnet-4)gpt-4o
COPILOT_MAX_TOKENSMaximum tokens for response""
COPILOT_API_BASE_URLAPI base URL (for enterprise environments)https://api.githubcopilot.com
For authentication, use secrets.githubCopilotToken with a token obtained via /auth login github-copilot, or set GITHUB_COPILOT_TOKEN as an environment variable.

Ollama

ValueDescriptionDefault
ollama.enabledEnable Ollamafalse
ollama.baseUrlOllama base URLhttp://ollama:11434
ollama.modelOllama model""

K8s Watcher

ValueDescriptionDefault
watcher.enabledEnable the watcherfalse
watcher.targetsMulti-deployment target list (see below)[]
watcher.deploymentSingle deployment - legacy""
watcher.namespaceDeployment namespace - legacy""
watcher.intervalCollection interval30s
watcher.windowObservation window2h
watcher.maxLogLinesLog lines per pod100
watcher.maxContextCharsLLM context budget32000
Fields for each target (watcher.targets[].):
FieldDescriptionRequired
deploymentDeployment nameYes
namespaceNamespace (default: default)No
metricsPortPrometheus port (0 = disabled)No
metricsPathHTTP metrics pathNo (/metrics)
metricsFilterGlob filters for metricsNo

Provider Fallback

ValueDescriptionDefault
fallback.enabledEnable automatic failover chainfalse
fallback.providersOrdered list of providers [{name, model}][]
fallback.maxRetriesRetries per provider before advancing2
fallback.cooldownBaseBase cooldown after failure30s
fallback.cooldownMaxMaximum cooldown (exponential backoff)5m

MCP (Model Context Protocol)

ValueDescriptionDefault
mcp.enabledEnable MCP integrationfalse
mcp.serversList of MCP servers [{name, transport, command, args, url, enabled}][]
mcp.existingConfigMapExisting ConfigMap with mcp_servers.json""

Bootstrap and Memory

ValueDescriptionDefault
bootstrap.enabledLoad bootstrap files (SOUL.md, USER.md, etc.)false
bootstrap.definitionsInline bootstrap file definitions{}
bootstrap.existingConfigMapExisting ConfigMap with bootstrap files""
memory.enabledEnable persistent memoryfalse
safety.enabledEnable configurable safety rulesfalse

Skill Registry

ValueDescriptionDefault
skillRegistry.enabledEnable environment variables for skill registryfalse
skillRegistry.registryUrlsAdditional registry URLs (comma-separated)""
skillRegistry.registryDisableRegistry names to disable (comma-separated)""
skillRegistry.installDirSkill installation directory inside the container""
When enabled, the values are passed as CHATCLI_REGISTRY_* environment variables in the ConfigMap. The ChatCLI container automatically creates ~/.chatcli/registries.yaml with the default registries (chatcli, clawhub). Use /skill search and /skill install to manage skills via registries.

Persistence

ValueDescriptionDefault
persistence.enabledPersist sessions in PVCtrue
persistence.storageClassStorage class""
persistence.sizeVolume size1Gi

Security

ValueDescriptionDefault
podSecurityContext.runAsNonRootEnforce non-root executiontrue
podSecurityContext.runAsUserProcess UID1000
podSecurityContext.seccompProfile.typeSeccomp profileRuntimeDefault
securityContext.allowPrivilegeEscalationAllow privilege escalationfalse
securityContext.readOnlyRootFilesystemRead-only filesystemtrue
securityContext.capabilities.dropDropped capabilitiesALL
rbac.clusterWideUse ClusterRole instead of namespace-scoped Rolefalse
When readOnlyRootFilesystem is true, the chart automatically mounts a tmpfs at /tmp and an emptyDir at /home/chatcli/.chatcli (200Mi) for runtime data. The HOME=/home/chatcli variable is set automatically. To monitor multiple namespaces, enable rbac.clusterWide: true. See the security documentation for details. Note: The ConfigMap and Secret referenced via envFrom are marked as optional: true, allowing you to create the Instance/Deployment before the dependent resources. The operator watches Secrets automatically and triggers rolling updates when they are created or updated.

Autoscaling (HPA)

ValueDescriptionDefault
autoscaling.enabledEnable HorizontalPodAutoscalerfalse
autoscaling.minReplicasMinimum replicas1
autoscaling.maxReplicasMaximum replicas10
autoscaling.targetCPUUtilizationPercentageTarget CPU utilization (%)80
autoscaling.targetMemoryUtilizationPercentageTarget memory utilization (%)""
When autoscaling.enabled is true, replicaCount is ignored and the HPA controls the number of replicas automatically.

Pod Disruption Budget

ValueDescriptionDefault
podDisruptionBudget.enabledCreate PodDisruptionBudgetfalse
podDisruptionBudget.minAvailableMinimum pods available during disruptions1
podDisruptionBudget.maxUnavailableMaximum unavailable pods (alternative to minAvailable)""
The PDB ensures high availability during node upgrades, drains, and cluster maintenance.

Network Policy

ValueDescriptionDefault
networkPolicy.enabledCreate NetworkPolicyfalse
networkPolicy.allowIngressFromAllowed ingress rules[]
networkPolicy.allowEgressToAllowed egress rules[]
NetworkPolicy restricts network traffic at the pod level. Requires a CNI with NetworkPolicy support (Calico, Cilium, etc.).

Networking

ValueDescriptionDefault
service.typeService typeClusterIP
service.portService port50051
service.headlessEnable headless Service for gRPC client-side load balancing (recommended when replicaCount > 1)false
ingress.enabledEnable Ingressfalse
gRPC and multiple replicas: gRPC uses persistent HTTP/2 connections that pin to a single pod. For replicaCount > 1, enable service.headless: true to activate round-robin load balancing via DNS. The client already has built-in keepalive and round-robin support. Ingress gRPC: When Ingress is enabled with className: nginx, the chart automatically adds the nginx.ingress.kubernetes.io/backend-protocol: "GRPC" annotation to route gRPC traffic correctly.

Using an Existing Secret

If you already have a Secret with the API keys:
helm install chatcli oci://ghcr.io/diillson/charts/chatcli \
  --set llm.provider=OPENAI \
  --set secrets.existingSecret=my-llm-keys
The Secret must contain the expected keys:
apiVersion: v1
kind: Secret
metadata:
  name: my-llm-keys
type: Opaque
stringData:
  OPENAI_API_KEY: "sk-xxx"
  ANTHROPIC_API_KEY: "sk-ant-xxx"
  OPENROUTER_API_KEY: "sk-or-xxx"  # optional
  GITHUB_COPILOT_TOKEN: "ghu_xxx"  # optional

Accessing the Server

kubectl port-forward svc/chatcli 50051:50051
chatcli connect localhost:50051

Ingress (with TLS)

# values-prod.yaml
ingress:
  enabled: true
  className: nginx
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
  hosts:
    - host: chatcli.mydomain.com
      paths:
        - path: /
          pathType: ImplementationSpecific
  tls:
    - secretName: chatcli-tls
      hosts:
        - chatcli.mydomain.com
helm install chatcli oci://ghcr.io/diillson/charts/chatcli -f values-prod.yaml

Upgrade and Rollback

# Upgrade
helm upgrade chatcli oci://ghcr.io/diillson/charts/chatcli --set llm.model=gpt-4-turbo

# Rollback
helm rollback chatcli 1

Security Configuration

The Helm chart supports advanced security configuration for production environments:
ValueDescriptionDefault
security.rateLimitRpsRequests per second limit (rate limiting)0 (disabled)
security.bindAddressServer bind address. Auto-detects 0.0.0.0 in Kubernetes via KUBERNETES_SERVICE_HOST.127.0.0.1 / 0.0.0.0 (K8s)
security.agentSecurityModeAgent security mode (strict or permissive)strict
security.jwtSecretRef.nameName of the Kubernetes Secret containing the JWT secret""
security.jwtSecretRef.keyKey within the Secret holding the JWT secret value""
security.auditLogEnable security audit loggingfalse
security.sessionEncryptionEnable session encryption at restfalse
# values-security.yaml
security:
  rateLimitRps: 20
  # bindAddress: "0.0.0.0"  # Optional — auto-detected in Kubernetes
  agentSecurityMode: strict
  auditLog: true
  sessionEncryption: true
  jwtSecretRef:
    name: chatcli-jwt
    key: secret
In Kubernetes, bindAddress is automatically detected as 0.0.0.0 via the KUBERNETES_SERVICE_HOST environment variable. No manual configuration is needed.
In production, always configure security.jwtSecretRef to enable JWT authentication. Without it, the server accepts unauthenticated connections.

Full Example: Production

Single-Target (Legacy)

helm install chatcli oci://ghcr.io/diillson/charts/chatcli \
  --namespace chatcli --create-namespace \
  --set llm.provider=CLAUDEAI \
  --set secrets.anthropicApiKey=sk-ant-xxx \
  --set server.token=super-secret-token \
  --set tls.enabled=true \
  --set tls.existingSecret=chatcli-tls-certs \
  --set watcher.enabled=true \
  --set watcher.deployment=production-app \
  --set watcher.namespace=production \
  --set persistence.enabled=true \
  --set persistence.size=5Gi \
  --set resources.requests.memory=256Mi \
  --set resources.limits.memory=1Gi
# values-prod.yaml
llm:
  provider: CLAUDEAI
secrets:
  existingSecret: chatcli-llm-keys
server:
  token: super-secret-token
tls:
  enabled: true
  existingSecret: chatcli-tls-certs
watcher:
  enabled: true
  interval: "15s"
  maxContextChars: 10000
  targets:
    - deployment: api-gateway
      namespace: production
      metricsPort: 9090
      metricsFilter: ["http_requests_*", "http_request_duration_*"]
    - deployment: auth-service
      namespace: production
      metricsPort: 9090
    - deployment: payment-service
      namespace: production
      metricsPort: 9090
      metricsFilter: ["payment_*", "stripe_*"]
    - deployment: worker
      namespace: batch
persistence:
  enabled: true
  size: 5Gi
resources:
  requests:
    memory: 256Mi
  limits:
    memory: 1Gi
helm install chatcli oci://ghcr.io/diillson/charts/chatcli \
  --namespace chatcli --create-namespace \
  -f values-prod.yaml
When targets are in different namespaces (e.g., production and batch), the chart automatically creates a ClusterRole instead of a namespace-scoped Role.

Next Steps

Server

Configure the gRPC server

Remote Connection

Connect to the server

K8s Watcher

Monitor Kubernetes