Skip to main content
Server Mode transforms ChatCLI into a high-performance gRPC service that can be accessed remotely from any terminal. This allows centralizing AI access on a server (bare-metal, VM, Docker, or Kubernetes) and connecting from anywhere.

Why Use Server Mode?

Centralization

A single server with configured API keys serves multiple clients

Security

API keys stay on the server, never exposed on client terminals

Flexibility

Clients can use their own credentials (API key or OAuth) if desired

Performance

Communication via gRPC with TLS support and progressive streaming
Server mode offers native integration with the K8s Watcher for Kubernetes deployment monitoring.

Starting the Server

1

Simplest mode

Server on the default port (50051):
chatcli server
2

With custom port and authentication

chatcli server --port 8080 --token my-secret-token
3

With TLS enabled

chatcli server --tls-cert cert.pem --tls-key key.pem
4

With integrated K8s Watcher (optional)

# Single-target (legacy)
chatcli server --watch-deployment myapp --watch-namespace production

# Multi-target + Prometheus metrics
chatcli server --watch-config targets.yaml
5

With provider fallback (optional)

chatcli server --fallback-providers OPENAI,CLAUDEAI,GOOGLEAI,COPILOT
6

With MCP (optional)

chatcli server --mcp-config ~/.chatcli/mcp_servers.json

Available Flags

FlagDescriptionDefaultEnv Var
--portgRPC server port50051CHATCLI_SERVER_PORT
--tokenAuthentication token (empty = no auth)""CHATCLI_SERVER_TOKEN
--tls-certTLS certificate file""CHATCLI_SERVER_TLS_CERT
--tls-keyTLS key file""CHATCLI_SERVER_TLS_KEY
--providerDefault LLM providerAuto-detectedLLM_PROVIDER
--modelDefault LLM modelAuto-detected
--metrics-portHTTP port for Prometheus metrics (0 = disable)9090CHATCLI_METRICS_PORT

Fallback Flags (optional)

FlagDescriptionDefaultEnv Var
--fallback-providersComma-separated list of providers for failover""CHATCLI_FALLBACK_PROVIDERS
--fallback-max-retriesAttempts per provider before advancing2CHATCLI_FALLBACK_MAX_RETRIES
--fallback-cooldown-baseBase cooldown after failure30sCHATCLI_FALLBACK_COOLDOWN_BASE
--fallback-cooldown-maxMaximum cooldown (exponential backoff)5mCHATCLI_FALLBACK_COOLDOWN_MAX

MCP Flag (optional)

FlagDescriptionDefaultEnv Var
--mcp-configMCP configuration JSON file""CHATCLI_MCP_CONFIG

Prometheus Metrics

The server exposes Prometheus metrics at http://localhost:9090/metrics by default. Metrics include:
  • gRPC: chatcli_grpc_requests_total, chatcli_grpc_request_duration_seconds, chatcli_grpc_in_flight_requests
  • LLM: chatcli_llm_requests_total, chatcli_llm_request_duration_seconds, chatcli_llm_errors_total
  • Watcher: chatcli_watcher_collection_duration_seconds, chatcli_watcher_alerts_total, chatcli_watcher_pods_ready
  • Session: chatcli_session_active_total, chatcli_session_operations_total
  • Server: chatcli_server_uptime_seconds, chatcli_server_info
  • Go runtime: goroutines, memory, GC (via GoCollector/ProcessCollector)
To disable, use --metrics-port 0.

Security Variables

Env VarDescriptionDefault
CHATCLI_GRPC_REFLECTIONEnables gRPC reflection for debugging. Keep disabled in production.false
CHATCLI_DISABLE_VERSION_CHECKDisables automatic version check on startup.false
gRPC reflection is disabled by default to avoid exposing the service schema in production. Enable only for local debugging. See the security documentation for all hardening measures.

K8s Watcher Flags (optional)

FlagDescriptionDefaultEnv Var
--watch-configMulti-target YAML file""CHATCLI_WATCH_CONFIG
--watch-deploymentSingle deployment (legacy)""CHATCLI_WATCH_DEPLOYMENT
--watch-namespaceDeployment namespace"default"CHATCLI_WATCH_NAMESPACE
--watch-intervalCollection interval30sCHATCLI_WATCH_INTERVAL
--watch-windowObservation window2hCHATCLI_WATCH_WINDOW
--watch-max-log-linesMax log lines per pod100CHATCLI_WATCH_MAX_LOG_LINES
--watch-kubeconfigKubeconfig pathAuto-detectedCHATCLI_KUBECONFIG
Use --watch-config to monitor multiple deployments simultaneously with Prometheus metrics. See K8s Watcher for the YAML file format.

Server Authentication

By default, the server does not require authentication. Any client can connect:
chatcli server  # no --token = open access

Credential Modes

The server supports multiple LLM credential modes, providing full flexibility:
The server uses its own API keys configured via environment variables:
export OPENAI_API_KEY=sk-xxx
export LLM_PROVIDER=OPENAI
chatcli server
No additional client configuration needed.
The client can send its own API key, which the server uses instead of its own:
chatcli connect server:50051 --llm-key sk-my-key --provider OPENAI
The client can use OAuth tokens from the local auth store (~/.chatcli/auth-profiles.json):
# First, log in with OAuth locally
/auth login anthropic

# Then, connect using local credentials
chatcli connect server:50051 --use-local-auth
For the StackSpot provider, send the complete credentials:
chatcli connect server:50051 --provider STACKSPOT \
  --client-id <id> --client-key <key> --realm <realm> --agent-id <agent>
To use GitHub Copilot, log in via Device Flow and connect with --use-local-auth:
# First, log in to GitHub Copilot
/auth login github-copilot

# Connect using local credentials
chatcli connect server:50051 --use-local-auth --provider COPILOT
For local models via Ollama, just provide the URL:
chatcli connect server:50051 --provider OLLAMA --ollama-url http://gpu-server:11434

gRPC Architecture

The server implements a gRPC service with the following RPCs:
RPCDescription
SendPromptSends a prompt and receives the complete response
StreamPromptSends a prompt and receives the response in progressive chunks
InteractiveSessionBidirectional streaming for interactive sessions
ListSessionsLists sessions saved on the server
LoadSessionLoads a saved session
SaveSessionSaves the current session
HealthServer health check
GetServerInfoServer information (version, provider, model, watcher)
GetWatcherStatusK8s Watcher status (if active)
ListRemotePluginsLists plugins available on the server
ListRemoteAgentsLists agents available on the server
ListRemoteSkillsLists skills available on the server
GetAgentDefinitionReturns the complete content of an agent (markdown + frontmatter)
GetSkillContentReturns the complete content of a skill
ExecuteRemotePluginExecutes a plugin on the server and returns the result
DownloadPluginStreaming download of a plugin binary
GetAlertsReturns active alerts from the K8s Watcher (used by the Operator)
AnalyzeIssueSends Issue context to the LLM and returns analysis + suggested actions

gRPC with Multiple Replicas

gRPC uses persistent HTTP/2 connections that, by default, pin to a single pod via kube-proxy. For scenarios with multiple replicas in Kubernetes:
  • 1 replica: Standard ClusterIP Service — no extra configuration needed
  • Multiple replicas: Use a headless Service (ClusterIP: None) so that DNS returns individual pod IPs, enabling client-side round-robin load balancing via gRPC dns:/// resolver
  • The ChatCLI client already has built-in keepalive (ping every 10s) and round-robin support
  • In the Helm chart, enable service.headless: true when replicaCount > 1
  • In the Operator, headless is activated automatically when spec.replicas > 1
For more details, see the K8s Operator documentation and Helm deployment.

Progressive Streaming

The StreamPrompt RPC splits the response into ~200 character chunks at natural boundaries (paragraphs, lines, sentences), providing a progressive response experience on the client.

Resource Discovery RPCs

The ListRemotePlugins, ListRemoteAgents, ListRemoteSkills, GetAgentDefinition, GetSkillContent, ExecuteRemotePlugin, and DownloadPlugin RPCs allow connected clients to discover and use resources installed on the server.
  • Plugins: Executed on the server via ExecuteRemotePlugin or downloaded via DownloadPlugin (binary streaming)
  • Agents/Skills: Markdown content transferred to the client via GetAgentDefinition/GetSkillContent for local prompt composition

AIOps Platform RPCs

The GetAlerts and AnalyzeIssue RPCs are used by the AIOps Operator to feed the autonomous remediation pipeline.

GetAlerts

Returns active alerts detected by the K8s Watcher:
rpc GetAlerts(GetAlertsRequest) returns (GetAlertsResponse);

message GetAlertsRequest {
  string namespace = 1;     // Filter by namespace (empty = all)
  string deployment = 2;    // Filter by deployment (empty = all)
}

message AlertInfo {
  string alert_type = 1;    // HighRestartCount, OOMKilled, PodNotReady, DeploymentFailing
  string deployment = 2;
  string namespace = 3;
  string message = 4;
  string severity = 5;      // critical, warning
  int64 timestamp = 6;
}

AnalyzeIssue

Sends Issue context to the LLM and returns structured analysis with suggested actions:
rpc AnalyzeIssue(AnalyzeIssueRequest) returns (AnalyzeIssueResponse);

message AnalyzeIssueRequest {
  string issue_name = 1;
  string namespace = 2;
  string resource_kind = 3;
  string resource_name = 4;
  string signal_type = 5;
  string severity = 6;
  string description = 7;
  int32 risk_score = 8;
  string provider = 9;
  string model = 10;
}

message SuggestedAction {
  string name = 1;
  string action = 2;
  string description = 3;
  map<string, string> params = 4;
}

message AnalyzeIssueResponse {
  string analysis = 1;
  float confidence = 2;     // 0.0-1.0
  repeated string recommendations = 3;
  string provider = 4;
  string model = 5;
  repeated SuggestedAction suggested_actions = 6;
}

K8s Watcher Integration

When the server is started with --watch-config or --watch-deployment, the K8s Watcher continuously monitors deployments and automatically injects the Kubernetes context into all prompts from remote clients.
chatcli server --watch-deployment myapp --watch-namespace production
Any connected user can ask questions about the deployments without additional configuration:
Connected to ChatCLI server (version: 1.0.0, provider: OPENAI, model: gpt-4o)
K8s watcher active: 5 targets (interval: 30s)

> Which deployments need attention?
> Analyze the HTTP metrics of api-gateway

Environment Variables

All environment variables used by local ChatCLI also work on the server:
# Server
CHATCLI_SERVER_PORT=50051
CHATCLI_SERVER_TOKEN=my-token
CHATCLI_SERVER_TLS_CERT=/path/to/cert.pem
CHATCLI_SERVER_TLS_KEY=/path/to/key.pem

# Security
CHATCLI_GRPC_REFLECTION=false
CHATCLI_DISABLE_VERSION_CHECK=false

# LLM
LLM_PROVIDER=CLAUDEAI
ANTHROPIC_API_KEY=sk-ant-xxx
ANTHROPIC_MODEL=claude-sonnet-4-5

# K8s Watcher (optional)
CHATCLI_WATCH_DEPLOYMENT=myapp
CHATCLI_WATCH_NAMESPACE=production
CHATCLI_WATCH_INTERVAL=30s
CHATCLI_WATCH_WINDOW=2h
CHATCLI_WATCH_MAX_LOG_LINES=100

Next Steps