ChatCLI can be packaged as a Docker container and deployed on Kubernetes using the official Helm chart. This page covers all deployment scenarios.
Official Images (GHCR)
Official Docker images are automatically published to the GitHub Container Registry with each release:
ChatCLI Server Latest version: 1.139.0
ghcr.io/diillson/chatcli:1.139.0
Kubernetes Operator Latest version: 1.139.0
ghcr.io/diillson/chatcli-operator:1.139.0
# Pull the server image (pinned version — recommended)
docker pull ghcr.io/diillson/chatcli:1.139.0
# Or the latest available
docker pull ghcr.io/diillson/chatcli:latest
# Pull the operator image
docker pull ghcr.io/diillson/chatcli-operator:1.139.0
The images support multi-arch (linux/amd64 and linux/arm64).
Building the Image (Local)
# From the project root
docker build -t chatcli .
The Dockerfile uses a multi-stage build to produce a minimal image (~20MB):
Build stage : golang:1.25-alpine compiles the binary
Runtime stage : alpine:3.21 with non-root user and built-in health check
Building the Operator Image (Local)
# IMPORTANT: must be built from the repository root
# (the operator's go.mod uses a replace directive pointing to ../)
docker build -f operator/Dockerfile -t ghcr.io/diillson/chatcli-operator:latest .
The operator Dockerfile uses:
Build stage : golang:1.25 with multi-arch support (TARGETARCH)
Runtime stage : gcr.io/distroless/static:nonroot (maximum security, no shell)
Running with Docker
Basic
With Auth
With Persistence
docker run -p 50051:50051 \
-e LLM_PROVIDER=OPENAI \
-e OPENAI_API_KEY=sk-xxx \
chatcli
docker run -p 50051:50051 \
-e CHATCLI_SERVER_TOKEN=my-token \
-e LLM_PROVIDER=CLAUDEAI \
-e ANTHROPIC_API_KEY=sk-ant-xxx \
chatcli
docker run -p 50051:50051 \
-v chatcli-sessions:/home/chatcli/.chatcli/sessions \
-e LLM_PROVIDER=OPENAI \
-e OPENAI_API_KEY=sk-xxx \
chatcli
Docker Compose
The project includes a docker-compose.yml ready for development:
Set the variables
export LLM_PROVIDER = OPENAI
export OPENAI_API_KEY = sk-xxx
Connect from your terminal
chatcli connect localhost:50051
Docker Compose configures:
Port 50051 exposed
Persistent volumes for sessions and plugins
Automatic restart (unless-stopped)
All LLM variables via environment
Security hardening : read-only filesystem, no-new-privileges, CPU/memory limits, tmpfs for /tmp
docker-compose.yml File
version : "3.9"
services :
chatcli-server :
build :
context : .
dockerfile : Dockerfile
container_name : chatcli-server
ports :
- "50051:50051"
environment :
CHATCLI_SERVER_PORT : "50051"
CHATCLI_SERVER_TOKEN : "${CHATCLI_SERVER_TOKEN:-}"
LLM_PROVIDER : "${LLM_PROVIDER:-}"
OPENAI_API_KEY : "${OPENAI_API_KEY:-}"
ANTHROPIC_API_KEY : "${ANTHROPIC_API_KEY:-}"
GOOGLEAI_API_KEY : "${GOOGLEAI_API_KEY:-}"
OPENROUTER_API_KEY : "${OPENROUTER_API_KEY:-}"
OLLAMA_ENABLED : "${OLLAMA_ENABLED:-}"
OLLAMA_BASE_URL : "${OLLAMA_BASE_URL:-}"
GITHUB_COPILOT_TOKEN : "${GITHUB_COPILOT_TOKEN:-}"
COPILOT_MODEL : "${COPILOT_MODEL:-}"
LOG_LEVEL : "${LOG_LEVEL:-info}"
volumes :
- chatcli-sessions:/home/chatcli/.chatcli/sessions
- chatcli-plugins:/home/chatcli/.chatcli/plugins
restart : unless-stopped
read_only : true
tmpfs :
- /tmp:size=100M
security_opt :
- no-new-privileges:true
deploy :
resources :
limits :
cpus : "2.0"
memory : 1G
volumes :
chatcli-sessions :
chatcli-plugins :
The container runs with a read-only filesystem and no-new-privileges by default. The /tmp directory uses an in-memory tmpfs (limited to 100MB). The named volumes (chatcli-sessions, chatcli-plugins) are the only writable mount points. See the security documentation for details.
Kubernetes (Helm)
ChatCLI Helm charts are available as OCI artifacts on GHCR — no need to clone the repository.
Prerequisites
Kubernetes cluster (kind, minikube, EKS, GKE, AKS, etc.)
Helm 3.8+ installed (OCI support)
kubectl configured for the cluster
Basic Installation
OpenAI
Anthropic (with Auth)
helm install chatcli oci://ghcr.io/diillson/charts/chatcli \
--set llm.provider=OPENAI \
--set secrets.openaiApiKey=sk-xxx
helm install chatcli oci://ghcr.io/diillson/charts/chatcli \
--set llm.provider=CLAUDEAI \
--set secrets.anthropicApiKey=sk-ant-xxx \
--set server.token=my-secret-token
Installation with Security (Helm)
For deployments with full security, including rate limiting, JWT authentication, and secure agent mode:
helm install chatcli oci://ghcr.io/diillson/charts/chatcli \
--set security.rateLimitRps= 20 \
--set security.agentSecurityMode=strict \
--set security.jwtSecretRef.name=chatcli-jwt \
--set security.jwtSecretRef.key=secret
Installation with K8s Watcher (Single-Target)
helm install chatcli oci://ghcr.io/diillson/charts/chatcli \
--set llm.provider=OPENAI \
--set secrets.openaiApiKey=sk-xxx \
--set watcher.enabled= true \
--set watcher.deployment=myapp \
--set watcher.namespace=production
Installation with Multi-Target + Prometheus
To monitor multiple deployments with Prometheus metrics, use a values.yaml:
# values-multi.yaml
llm :
provider : CLAUDEAI
secrets :
anthropicApiKey : sk-ant-xxx
watcher :
enabled : true
interval : "15s"
maxContextChars : 32000
targets :
- deployment : api-gateway
namespace : production
metricsPort : 9090
metricsFilter : [ "http_requests_*" , "http_request_duration_*" ]
- deployment : auth-service
namespace : production
metricsPort : 9090
- deployment : worker
namespace : batch
helm install chatcli oci://ghcr.io/diillson/charts/chatcli -f values-multi.yaml
The chart automatically:
Creates a ServiceAccount with RBAC for the watcher to read pods, events, and logs
Auto-detects multi-namespace : if targets are in different namespaces, uses ClusterRole instead of Role
Generates a ConfigMap <name>-watch-config with the multi-target YAML
Mounts the config as a volume and passes --watch-config to the container
Properly passes --token, --model, and --mcp-config flags to the server
Uses native gRPC health probes (liveness, readiness, and startup) instead of pidof
Includes all 17 operator CRDs in the crds/ directory
Helm Chart Values
Value Description Default replicaCountNumber of replicas 1image.repositoryImage repository ghcr.io/diillson/chatcliimage.tagImage tag latestserver.portgRPC port 50051server.metricsPortHTTP port for Prometheus metrics (0 = disabled) 9090server.tokenAuthentication token ""server.grpcReflectionEnable gRPC reflection (debugging) falseserviceMonitor.enabledCreate ServiceMonitor (requires Prometheus Operator) falseserviceMonitor.intervalPrometheus scrape interval 30s
Value Description Default tls.enabledEnable TLS falsetls.certFileCertificate path ""tls.keyFileKey path ""tls.existingSecretExisting Secret with certs ""
Value Description Default llm.providerDefault provider ""llm.modelDefault model ""
Secrets (API Keys)
Value Description secrets.existingSecretExisting Secret (instead of creating a new one) secrets.openaiApiKeyOpenAI key secrets.anthropicApiKeyAnthropic key secrets.googleaiApiKeyGoogle AI key secrets.xaiApiKeyxAI key secrets.stackspotClientIdStackSpot Client ID secrets.stackspotClientKeyStackSpot Client Key secrets.stackspotRealmStackSpot Realm secrets.stackspotAgentIdStackSpot Agent ID secrets.openrouterApiKeyOpenRouter API key secrets.githubCopilotTokenGitHub Copilot OAuth token
GitHub Copilot
Value Description Default COPILOT_MODELDefault Copilot model (e.g., gpt-4o, claude-sonnet-4) gpt-4oCOPILOT_MAX_TOKENSMaximum tokens for response ""COPILOT_API_BASE_URLAPI base URL (for enterprise environments) https://api.githubcopilot.com
For authentication, use secrets.githubCopilotToken with a token obtained via /auth login github-copilot, or set GITHUB_COPILOT_TOKEN as an environment variable.
Value Description Default ollama.enabledEnable Ollama falseollama.baseUrlOllama base URL http://ollama:11434ollama.modelOllama model ""
K8s Watcher
Value Description Default watcher.enabledEnable the watcher falsewatcher.targetsMulti-deployment target list (see below) []watcher.deploymentSingle deployment - legacy ""watcher.namespaceDeployment namespace - legacy ""watcher.intervalCollection interval 30swatcher.windowObservation window 2hwatcher.maxLogLinesLog lines per pod 100watcher.maxContextCharsLLM context budget 32000
Fields for each target (watcher.targets[].):
Field Description Required deploymentDeployment name Yes namespaceNamespace (default: default) No metricsPortPrometheus port (0 = disabled) No metricsPathHTTP metrics path No (/metrics) metricsFilterGlob filters for metrics No
Provider Fallback
Value Description Default fallback.enabledEnable automatic failover chain falsefallback.providersOrdered list of providers [{name, model}] []fallback.maxRetriesRetries per provider before advancing 2fallback.cooldownBaseBase cooldown after failure 30sfallback.cooldownMaxMaximum cooldown (exponential backoff) 5m
MCP (Model Context Protocol)
Value Description Default mcp.enabledEnable MCP integration falsemcp.serversList of MCP servers [{name, transport, command, args, url, enabled}] []mcp.existingConfigMapExisting ConfigMap with mcp_servers.json ""
Bootstrap and Memory
Value Description Default bootstrap.enabledLoad bootstrap files (SOUL.md, USER.md, etc.) falsebootstrap.definitionsInline bootstrap file definitions {}bootstrap.existingConfigMapExisting ConfigMap with bootstrap files ""memory.enabledEnable persistent memory falsesafety.enabledEnable configurable safety rules false
Skill Registry
Value Description Default skillRegistry.enabledEnable environment variables for skill registry falseskillRegistry.registryUrlsAdditional registry URLs (comma-separated) ""skillRegistry.registryDisableRegistry names to disable (comma-separated) ""skillRegistry.installDirSkill installation directory inside the container ""
When enabled, the values are passed as CHATCLI_REGISTRY_* environment variables in the ConfigMap. The ChatCLI container automatically creates ~/.chatcli/registries.yaml with the default registries (chatcli, clawhub). Use /skill search and /skill install to manage skills via registries.
Persistence
Value Description Default persistence.enabledPersist sessions in PVC truepersistence.storageClassStorage class ""persistence.sizeVolume size 1Gi
Security
Value Description Default podSecurityContext.runAsNonRootEnforce non-root execution truepodSecurityContext.runAsUserProcess UID 1000podSecurityContext.seccompProfile.typeSeccomp profile RuntimeDefaultsecurityContext.allowPrivilegeEscalationAllow privilege escalation falsesecurityContext.readOnlyRootFilesystemRead-only filesystem truesecurityContext.capabilities.dropDropped capabilities ALLrbac.clusterWideUse ClusterRole instead of namespace-scoped Role false
When readOnlyRootFilesystem is true, the chart automatically mounts a tmpfs at /tmp and an emptyDir at /home/chatcli/.chatcli (200Mi) for runtime data. The HOME=/home/chatcli variable is set automatically. To monitor multiple namespaces, enable rbac.clusterWide: true. See the security documentation for details.
Note : The ConfigMap and Secret referenced via envFrom are marked as optional: true, allowing you to create the Instance/Deployment before the dependent resources. The operator watches Secrets automatically and triggers rolling updates when they are created or updated.
Autoscaling (HPA)
Value Description Default autoscaling.enabledEnable HorizontalPodAutoscaler falseautoscaling.minReplicasMinimum replicas 1autoscaling.maxReplicasMaximum replicas 10autoscaling.targetCPUUtilizationPercentageTarget CPU utilization (%) 80autoscaling.targetMemoryUtilizationPercentageTarget memory utilization (%) ""
When autoscaling.enabled is true, replicaCount is ignored and the HPA controls the number of replicas automatically.
Pod Disruption Budget
Value Description Default podDisruptionBudget.enabledCreate PodDisruptionBudget falsepodDisruptionBudget.minAvailableMinimum pods available during disruptions 1podDisruptionBudget.maxUnavailableMaximum unavailable pods (alternative to minAvailable) ""
The PDB ensures high availability during node upgrades, drains, and cluster maintenance.
Network Policy
Value Description Default networkPolicy.enabledCreate NetworkPolicy falsenetworkPolicy.allowIngressFromAllowed ingress rules []networkPolicy.allowEgressToAllowed egress rules []
NetworkPolicy restricts network traffic at the pod level. Requires a CNI with NetworkPolicy support (Calico, Cilium, etc.).
Networking
Value Description Default service.typeService type ClusterIPservice.portService port 50051service.headlessEnable headless Service for gRPC client-side load balancing (recommended when replicaCount > 1) falseingress.enabledEnable Ingress false
gRPC and multiple replicas : gRPC uses persistent HTTP/2 connections that pin to a single pod. For replicaCount > 1, enable service.headless: true to activate round-robin load balancing via DNS. The client already has built-in keepalive and round-robin support.
Ingress gRPC : When Ingress is enabled with className: nginx, the chart automatically adds the nginx.ingress.kubernetes.io/backend-protocol: "GRPC" annotation to route gRPC traffic correctly.
Using an Existing Secret
If you already have a Secret with the API keys:
helm install chatcli oci://ghcr.io/diillson/charts/chatcli \
--set llm.provider=OPENAI \
--set secrets.existingSecret=my-llm-keys
The Secret must contain the expected keys:
apiVersion : v1
kind : Secret
metadata :
name : my-llm-keys
type : Opaque
stringData :
OPENAI_API_KEY : "sk-xxx"
ANTHROPIC_API_KEY : "sk-ant-xxx"
OPENROUTER_API_KEY : "sk-or-xxx" # optional
GITHUB_COPILOT_TOKEN : "ghu_xxx" # optional
Accessing the Server
Port Forward (Dev)
NodePort
LoadBalancer
kubectl port-forward svc/chatcli 50051:50051
chatcli connect localhost:50051
helm install chatcli oci://ghcr.io/diillson/charts/chatcli \
--set service.type=NodePort
chatcli connect < node-i p > : < node-por t >
helm install chatcli oci://ghcr.io/diillson/charts/chatcli \
--set service.type=LoadBalancer
# Wait for the external IP
kubectl get svc chatcli -w
chatcli connect < external-i p > :50051
Ingress (with TLS)
# values-prod.yaml
ingress :
enabled : true
className : nginx
annotations :
cert-manager.io/cluster-issuer : letsencrypt-prod
hosts :
- host : chatcli.mydomain.com
paths :
- path : /
pathType : ImplementationSpecific
tls :
- secretName : chatcli-tls
hosts :
- chatcli.mydomain.com
helm install chatcli oci://ghcr.io/diillson/charts/chatcli -f values-prod.yaml
Upgrade and Rollback
# Upgrade
helm upgrade chatcli oci://ghcr.io/diillson/charts/chatcli --set llm.model=gpt-4-turbo
# Rollback
helm rollback chatcli 1
Security Configuration
The Helm chart supports advanced security configuration for production environments:
Value Description Default security.rateLimitRpsRequests per second limit (rate limiting) 0 (disabled)security.bindAddressServer bind address. Auto-detects 0.0.0.0 in Kubernetes via KUBERNETES_SERVICE_HOST. 127.0.0.1 / 0.0.0.0 (K8s)security.agentSecurityModeAgent security mode (strict or permissive) strictsecurity.jwtSecretRef.nameName of the Kubernetes Secret containing the JWT secret ""security.jwtSecretRef.keyKey within the Secret holding the JWT secret value ""security.auditLogEnable security audit logging falsesecurity.sessionEncryptionEnable session encryption at rest false
# values-security.yaml
security :
rateLimitRps : 20
# bindAddress: "0.0.0.0" # Optional — auto-detected in Kubernetes
agentSecurityMode : strict
auditLog : true
sessionEncryption : true
jwtSecretRef :
name : chatcli-jwt
key : secret
In Kubernetes, bindAddress is automatically detected as 0.0.0.0 via the KUBERNETES_SERVICE_HOST environment variable. No manual configuration is needed.
In production, always configure security.jwtSecretRef to enable JWT authentication. Without it, the server accepts unauthenticated connections.
Full Example: Production
Single-Target (Legacy)
helm install chatcli oci://ghcr.io/diillson/charts/chatcli \
--namespace chatcli --create-namespace \
--set llm.provider=CLAUDEAI \
--set secrets.anthropicApiKey=sk-ant-xxx \
--set server.token=super-secret-token \
--set tls.enabled= true \
--set tls.existingSecret=chatcli-tls-certs \
--set watcher.enabled= true \
--set watcher.deployment=production-app \
--set watcher.namespace=production \
--set persistence.enabled= true \
--set persistence.size=5Gi \
--set resources.requests.memory=256Mi \
--set resources.limits.memory=1Gi
Multi-Target with Prometheus (Recommended)
# values-prod.yaml
llm :
provider : CLAUDEAI
secrets :
existingSecret : chatcli-llm-keys
server :
token : super-secret-token
tls :
enabled : true
existingSecret : chatcli-tls-certs
watcher :
enabled : true
interval : "15s"
maxContextChars : 10000
targets :
- deployment : api-gateway
namespace : production
metricsPort : 9090
metricsFilter : [ "http_requests_*" , "http_request_duration_*" ]
- deployment : auth-service
namespace : production
metricsPort : 9090
- deployment : payment-service
namespace : production
metricsPort : 9090
metricsFilter : [ "payment_*" , "stripe_*" ]
- deployment : worker
namespace : batch
persistence :
enabled : true
size : 5Gi
resources :
requests :
memory : 256Mi
limits :
memory : 1Gi
helm install chatcli oci://ghcr.io/diillson/charts/chatcli \
--namespace chatcli --create-namespace \
-f values-prod.yaml
When targets are in different namespaces (e.g., production and batch), the chart automatically creates a ClusterRole instead of a namespace-scoped Role.
Next Steps
Server Configure the gRPC server
Remote Connection Connect to the server
K8s Watcher Monitor Kubernetes