Architecture
- Adapter — a platform integration. Receives messages and sends replies using the platform’s native HTTP API (no third-party SDKs in
go.mod). - Runner — routes messages per conversation (
platform:chat), with bounded concurrency and graceful shutdown. Installs a progress emitter per message. - Agent loop — each message becomes an
/agenttask (auto-execute). The narration the agent already prints is captured and streamed to the chat.
”Talking as it works” streaming and throttling
While the agent works, its substantive output lines are coalesced and sent as periodic progress messages (at most one every few seconds, to respect the platforms’ per-chat rate limits). Purely decorative lines (box-drawing, spinners) are filtered out. When it finishes, the gateway sends a completion notice.”The assistant is working” indicator
As soon as a message arrives, the gateway signals that it was received and is being worked on — so you’re never left wondering whether it landed on slower turns:- Channels with a native indicator (Telegram): show the native “typing…” (
sendChatAction), auto-refreshed before it expires while the agent works — no message clutter. - Other channels: send one short notice (”🤔 …”, localized via
gateway.thinking) after a small delay, and only if the reply isn’t out yet — fast replies send nothing.
Concurrency
Because the agent loop mutates shared ChatCLI state andos.Stdout is process-global, agent runs are serialized: messages arrive in parallel (up to 4 buffered) but run one at a time. Each conversation keeps a light context (the user’s recent requests) for continuity across turns; the real durable state lives in the workspace files the agent edits.
Runtime model
The gateway mirrors the model (and provider) your interactive session is using — not the.env default. Switching model or provider in the REPL with /switch, /model, or /max-tokens propagates to the daemon: it re-reads the choice before every message, so a conversation already running on Telegram starts answering with the new model without restarting the gateway.
Because the daemon runs as a separate process, the sync goes through a small state file at ~/.chatcli/runtime_model.json that the interactive session writes and the daemon reads. This covers both cases: starting the gateway after a model switch, and switching while the gateway is already running.
When you switch provider, the daemon adopts the new provider with its correct model — as long as that provider’s credentials are present in the environment the daemon inherited (usually your
.env). Adjustments that live only in memory, such as StackSpot’s /switch --realm / --agent-id, do not propagate through this file; set them via environment variable or restart the gateway.Conversational replies (not “coder tone”)
The gateway runs the same engine as/coder — all the tools (read/edit files, shell, web, MCP) are still available — but with its own conversational voice. The final reply is the message the person reads in chat, not a technical commit summary: direct, natural text, with no tables, banners, ASCII art, or long code blocks (unless code is asked for). Under the hood this is a dedicated gateway system prompt used in place of the coder prompt, preserving the tool-use mechanics.
Dynamic language (follows the sender)
Replies come out in the language of the user’s message, detected every turn — not pinned to the daemon locale. Portuguese → answers in Portuguese; Spanish → Spanish; and so on. The dynamic language directive is applied on every gateway path (including with an active persona), so the reply is never statically stuck in one language. In the interactive CLI, the fixed per-locale directive (CHATCLI_LANG) still applies — only the gateway changes.
Activation
The gateway runs as a detached daemon:/gateway start re-execs the binary as a background process (chatcli gateway, with its own stdout and log) and hands the REPL back immediately — you keep using the chat as usual. The daemon is tracked by a pidfile at ~/.chatcli/gateway.pid and stays alive until you stop it (there’s no Ctrl+C; use /gateway stop).
~/.chatcli/gateway.log (the path is printed on start). Each adapter only starts when its required variables are present — set only the channels you want to use. Full reference in Environment Variables → Chat Gateway.
- Telegram
- Slack
- Discord
- WhatsApp
- Webhook
Long-polling via
getUpdates (no exposed HTTP server).Usage example
111111111 sends:
“list the Go files changed in the last commit and summarize the diff”The bot replies with progress messages as the agent runs
git, reads files and reasons, then closes with ✅ Task completed.
Anyone not on the allow-list is ignored. Slack and the generic webhook verify the signature/secret before processing.
Supported platforms
| Platform | Transport | Required | Auth |
|---|---|---|---|
| Telegram | long-polling | CHATCLI_TELEGRAM_BOT_TOKEN | user-ID allow-list |
| Slack | Events API (HTTP) | ..._BOT_TOKEN, ..._ADDR | signing secret (HMAC) |
| Discord | Gateway WebSocket v10 | CHATCLI_DISCORD_BOT_TOKEN | bot token |
| Cloud API (webhook) | ..._ACCESS_TOKEN, ..._PHONE_ID, ..._ADDR | verify token | |
| Webhook | generic HTTP | CHATCLI_WEBHOOK_ADDR | secret (constant-time) |
Voice messages (transcription)
The gateway accepts audio / voice notes on every channel. The message is transcribed to text before the pipeline — so it works with any of the 14 chat providers (they only ever see text; no multimodal model or message redesign needed). The adapter downloads the media, transcribes it, and the agent treats it as a normal text request — the transcript is even recorded in the Conversation Hub.| Channel | Voice source |
|---|---|
| Telegram | voice note / audio (getFile → download) |
| audio message (Graph media API lookup) | |
| Discord | audio/* attachment (CDN) |
| Slack | audio/* file (url_private, bearer token) |
| Webhook | audio_b64 (inline base64) or audio_url |
Transcription backend (zero-config, local-first, keyless)
Selection is local/keyless first — and since v1.135 it has an embedded floor: with nothing configured, the gateway uses the embedded Whisper (multilingual, via sherpa-onnx — the same engine as the Kokoro TTS), no API key and no cgo. The daemon pre-downloads engine + model at startup, so the first voice note arrives with everything ready.CHATCLI_TRANSCRIPTION_CMD— your own local STT command (any wrapper). Reads the transcript from stdout, or from the.txtit writes into{output_dir}.CHATCLI_TRANSCRIPTION_URL— a self-hosted OpenAI-compatible endpoint (whisper.cppwhisper-server, faster-whisper, Speaches). Keyless (unlessCHATCLI_TRANSCRIPTION_KEY).- Embedded Whisper already provisioned — when the cache (
~/.cache/chatcli/stt/) holds engine + model, it wins over any cloud key. - A whisper CLI on PATH — if
whisper(openai-whisper) orwhisper-cli(whisper.cpp) is installed, it’s used automatically, zero config. The ggml model is downloaded once to the cache (~/.cache/chatcli/whisper/), like faster-whisper does. GROQ_API_KEY→ Groq Whisper (free tier).OPENAI_API_KEY→ OpenAI Whisper.- Nothing configured → embedded Whisper: one-time download (engine ~25MB +
basemodel ~200MB) when the daemon starts. Only platforms without a prebuilt engine (outside Linux/macOS/Windows x64/arm64) fall back to the configuration hint.
CHATCLI_TRANSCRIPTION_PROVIDER pins a backend (embedded|command|url|groq|openai) — =embedded forces the embedded engine even with whisper/keys present. CHATCLI_TRANSCRIPTION_MODEL picks the embedded model size (tiny|base|small|medium|large-v3, default base) or the cloud model; _LANG pins the language (default: auto-detect the spoken language); CHATCLI_TRANSCRIPTION_CACHE_DIR relocates the cache (absolute path — useful for air-gapped pre-seeding); CHATCLI_GATEWAY_MAX_AUDIO_BYTES caps the download size (default 20MB). The active backend is shown in /config integrations.
Quick setup
Zero-config (embedded Whisper — recommended):/gateway stop && /gateway start) and send a voice message.
Voice replies
The way back speaks too: by default (CHATCLI_GATEWAY_VOICE_REPLY=auto) the gateway answers voice with voice — an audio message gets a voice note, text gets text — with any TTS backend, including the embedded Kokoro engine (offline, no API key). Each conversation toggles it by asking in natural language (“answer me in audio” / “stop sending audio”) via the @voice tool, with the preference persisted per session. Spoken replies are written for the ear (no emojis, lists or markdown in the audio) and wav/aiff output is transcoded to OGG/Opus when ffmpeg is present, becoming a native Telegram voice note. Modes, details and troubleshooting in Voice Replies.
The gateway also treats the user’s memory index as real knowledge: personal questions (“what do you know about me?”) consult persistent memory through @memory recall before any “I don’t know”.
Cross-channel continuity
When the Conversation Hub is active (the default), the gateway shares the conversation with the chatcli on your notebook: a topic started on Telegram continues in the terminal and vice-versa. Each incoming message resolves the sender’s principal, reads recent context, and records the turn in the hub — so what you said on the notebook shows up as context on Telegram, with zero configuration (single-user mode). For real-time push to a connected CLI, run the gateway inside the server withCHATCLI_GATEWAY_IN_SERVER=true. Multi-user bots use CHATCLI_HUB_ISOLATE=true + bindings. Details in Conversation Hub.
See also
- Conversation Hub — conversation continuity across channels and the notebook
- Agent Mode — the loop that runs each message
- Security — hardening auto-execution
- Environment Variables