Chat Gateway — ChatCLI on Telegram, Slack, Discord, WhatsApp and Webhook

The Chat Gateway exposes ChatCLI as a bot/service on messaging platforms. You talk to it over Telegram, Slack, Discord, WhatsApp or a generic webhook, and every inbound message runs through the real agent loop — the agent uses its tools (read/search, shell, file edits, web), performs the task and streams progress back to the conversation as it works, closing with a completion notice.

The agent runs with auto-execution (tools and shell). Treat the gateway as a privileged remote surface: control who can reach the bot (Telegram allow-list, Slack signing secret, webhook secret) and harden the agent with CHATCLI_AGENT_SECURITY_MODE=strict when exposing it. See Coder/Agent Security.

Architecture

Adapter (per platform)  --inbound-->  Runner  --agent loop (streaming)-->  reply
        ^                                |
        └──────── Send(progress/reply) ──┘

Adapter — a platform integration. Receives messages and sends replies using the platform’s native HTTP API (no third-party SDKs in go.mod).
Runner — routes messages per conversation (platform:chat), with bounded concurrency and graceful shutdown. Installs a progress emitter per message.
Agent loop — each message becomes an /agent task (auto-execute). The narration the agent already prints is captured and streamed to the chat.

”Talking as it works” streaming and throttling

While the agent works, its substantive output lines are coalesced and sent as periodic progress messages (at most one every few seconds, to respect the platforms’ per-chat rate limits). Purely decorative lines (box-drawing, spinners) are filtered out. When it finishes, the gateway sends a completion notice.

”The assistant is working” indicator

As soon as a message arrives, the gateway signals that it was received and is being worked on — so you’re never left wondering whether it landed on slower turns:

Channels with a native indicator (Telegram): show the native “typing…” (sendChatAction), auto-refreshed before it expires while the agent works — no message clutter.
Other channels: send one short notice (”🤔 …”, localized via gateway.thinking) after a small delay, and only if the reply isn’t out yet — fast replies send nothing.

The signal stops the moment the agent replies.

Concurrency

Because the agent loop mutates shared ChatCLI state and os.Stdout is process-global, agent runs are serialized: messages arrive in parallel (up to 4 buffered) but run one at a time. Each conversation keeps a light context (the user’s recent requests) for continuity across turns; the real durable state lives in the workspace files the agent edits.

Runtime model

The gateway mirrors the model (and provider) your interactive session is using — not the .env default. Switching model or provider in the REPL with /switch, /model, or /max-tokens propagates to the daemon: it re-reads the choice before every message, so a conversation already running on Telegram starts answering with the new model without restarting the gateway. Because the daemon runs as a separate process, the sync goes through a small state file at ~/.chatcli/runtime_model.json that the interactive session writes and the daemon reads. This covers both cases: starting the gateway after a model switch, and switching while the gateway is already running.

When you switch provider, the daemon adopts the new provider with its correct model — as long as that provider’s credentials are present in the environment the daemon inherited (usually your .env). Adjustments that live only in memory, such as StackSpot’s /switch --realm / --agent-id, do not propagate through this file; set them via environment variable or restart the gateway.

Conversational replies (not “coder tone”)

The gateway runs the same engine as /coder — all the tools (read/edit files, shell, web, MCP) are still available — but with its own conversational voice. The final reply is the message the person reads in chat, not a technical commit summary: direct, natural text, with no tables, banners, ASCII art, or long code blocks (unless code is asked for). Under the hood this is a dedicated gateway system prompt used in place of the coder prompt, preserving the tool-use mechanics.

Dynamic language (follows the sender)

Replies come out in the language of the user’s message, detected every turn — not pinned to the daemon locale. Portuguese → answers in Portuguese; Spanish → Spanish; and so on. The dynamic language directive is applied on every gateway path (including with an active persona), so the reply is never statically stuck in one language. In the interactive CLI, the fixed per-locale directive (CHATCLI_LANG) still applies — only the gateway changes.

Activation

The gateway runs as a detached daemon: /gateway start re-execs the binary as a background process (chatcli gateway, with its own stdout and log) and hands the REPL back immediately — you keep using the chat as usual. The daemon is tracked by a pidfile at ~/.chatcli/gateway.pid and stays alive until you stop it (there’s no Ctrl+C; use /gateway stop).

/gateway start     # start the configured adapters in the background (REPL stays free)
/gateway status    # show the daemon pid + registered/ready platforms
/gateway stop      # terminate the daemon

The daemon’s activity goes to ~/.chatcli/gateway.log (the path is printed on start). Each adapter only starts when its required variables are present — set only the channels you want to use. Full reference in Environment Variables → Chat Gateway.

Telegram
Slack
Discord
WhatsApp
Webhook

Long-polling via getUpdates (no exposed HTTP server).

export CHATCLI_TELEGRAM_BOT_TOKEN="123456:ABC-DEF..."         # required (BotFather)
export CHATCLI_TELEGRAM_ALLOWED_USERS="111111111,222222222"  # optional; empty = all

Events API with HMAC signature verification.

export CHATCLI_SLACK_BOT_TOKEN="xoxb-..."        # required
export CHATCLI_SLACK_ADDR=":8081"                # required (events server bind)
export CHATCLI_SLACK_SIGNING_SECRET="..."        # recommended (verifies events)
export CHATCLI_SLACK_PATH="/slack/events"        # optional (default)

Gateway WebSocket v10.

export CHATCLI_DISCORD_BOT_TOKEN="..."           # required

WhatsApp Cloud API (webhook + Meta verify handshake).

export CHATCLI_WHATSAPP_ACCESS_TOKEN="..."        # required
export CHATCLI_WHATSAPP_PHONE_ID="123456"         # required
export CHATCLI_WHATSAPP_ADDR=":8082"              # required (webhook bind)
export CHATCLI_WHATSAPP_VERIFY_TOKEN="my-verify"  # GET handshake
export CHATCLI_WHATSAPP_PATH="/whatsapp/webhook"  # optional (default)

Generic HTTP endpoint — integrate any system that can POST.

export CHATCLI_WEBHOOK_ADDR=":8083"               # required (bind)
export CHATCLI_WEBHOOK_PATH="/inbound"            # optional (default)
export CHATCLI_WEBHOOK_SECRET="supersecret"       # optional; constant-time validated
export CHATCLI_WEBHOOK_CALLBACK_URL="https://app/cb" # optional; empty = synchronous reply

Usage example

# Server terminal
export LLM_PROVIDER=CLAUDEAI
export ANTHROPIC_API_KEY=sk-ant-...
export CHATCLI_TELEGRAM_BOT_TOKEN=123456:ABC...
export CHATCLI_TELEGRAM_ALLOWED_USERS=111111111
chatcli
> /gateway start
  OK Gateway started (pid=4821) on: telegram. Logs: ~/.chatcli/gateway.log

On Telegram, user 111111111 sends:

“list the Go files changed in the last commit and summarize the diff”

The bot replies with progress messages as the agent runs git, reads files and reasons, then closes with ✅ Task completed.

Anyone not on the allow-list is ignored. Slack and the generic webhook verify the signature/secret before processing.

Supported platforms

Platform	Transport	Required	Auth
Telegram	long-polling	`CHATCLI_TELEGRAM_BOT_TOKEN`	user-ID allow-list
Slack	Events API (HTTP)	`..._BOT_TOKEN`, `..._ADDR`	signing secret (HMAC)
Discord	Gateway WebSocket v10	`CHATCLI_DISCORD_BOT_TOKEN`	bot token
WhatsApp	Cloud API (webhook)	`..._ACCESS_TOKEN`, `..._PHONE_ID`, `..._ADDR`	verify token
Webhook	generic HTTP	`CHATCLI_WEBHOOK_ADDR`	secret (constant-time)

Voice messages (transcription)

The gateway accepts audio / voice notes on every channel. The message is transcribed to text before the pipeline — so it works with any of the 14 chat providers (they only ever see text; no multimodal model or message redesign needed). The adapter downloads the media, transcribes it, and the agent treats it as a normal text request — the transcript is even recorded in the Conversation Hub.

Channel	Voice source
Telegram	voice note / audio (`getFile` → download)
WhatsApp	audio message (Graph media API lookup)
Discord	`audio/*` attachment (CDN)
Slack	`audio/*` file (`url_private`, bearer token)
Webhook	`audio_b64` (inline base64) or `audio_url`

Transcription backend (zero-config, local-first, keyless)

Selection is local/keyless first — and since v1.135 it has an embedded floor: with nothing configured, the gateway uses the embedded Whisper (multilingual, via sherpa-onnx — the same engine as the Kokoro TTS), no API key and no cgo. The daemon pre-downloads engine + model at startup, so the first voice note arrives with everything ready.

CHATCLI_TRANSCRIPTION_CMD — your own local STT command (any wrapper). Reads the transcript from stdout, or from the .txt it writes into {output_dir}.
CHATCLI_TRANSCRIPTION_URL — a self-hosted OpenAI-compatible endpoint (whisper.cpp whisper-server, faster-whisper, Speaches). Keyless (unless CHATCLI_TRANSCRIPTION_KEY).
Embedded Whisper already provisioned — when the cache (~/.cache/chatcli/stt/) holds engine + model, it wins over any cloud key.
A whisper CLI on PATH — if whisper (openai-whisper) or whisper-cli (whisper.cpp) is installed, it’s used automatically, zero config. The ggml model is downloaded once to the cache (~/.cache/chatcli/whisper/), like faster-whisper does.
GROQ_API_KEY → Groq Whisper (free tier).
OPENAI_API_KEY → OpenAI Whisper.
Nothing configured → embedded Whisper: one-time download (engine ~25MB + base model ~200MB) when the daemon starts. Only platforms without a prebuilt engine (outside Linux/macOS/Windows x64/arm64) fall back to the configuration hint.

CHATCLI_TRANSCRIPTION_PROVIDER pins a backend (embedded|command|url|groq|openai) — =embedded forces the embedded engine even with whisper/keys present. CHATCLI_TRANSCRIPTION_MODEL picks the embedded model size (tiny|base|small|medium|large-v3, default base) or the cloud model; _LANG pins the language (default: auto-detect the spoken language); CHATCLI_TRANSCRIPTION_CACHE_DIR relocates the cache (absolute path — useful for air-gapped pre-seeding); CHATCLI_GATEWAY_MAX_AUDIO_BYTES caps the download size (default 20MB). The active backend is shown in /config integrations.

Opus needs a decoder. Telegram/WhatsApp/Discord voice notes are OGG/Opus, and neither whisper.cpp nor the embedded engine decodes Opus natively. With ffmpeg installed, the gateway transcodes to 16 kHz WAV automatically and everything works; without ffmpeg, install it or use a cloud/self-hosted backend (which decodes server-side). The language is auto-detected, so the transcript comes out in the spoken language — and so does the reply.

Quick setup

Zero-config (embedded Whisper — recommended):

brew install ffmpeg        # just the Opus decoder; the rest is automatic
/gateway start             # the daemon downloads engine + model on first boot

100% local with whisper.cpp (if you prefer the ggml engine):

brew install whisper-cpp ffmpeg          # macOS  (Linux: apt/dnf; Windows: scoop/winget)
# nothing else: chatcli detects whisper-cli, downloads the model on first use, and uses ffmpeg for Opus

Self-hosted (a whisper server that decodes Opus):

export CHATCLI_TRANSCRIPTION_URL="http://localhost:8080/v1"   # keyless

Cloud (decodes Opus server-side, nothing installed locally):

export CHATCLI_TRANSCRIPTION_PROVIDER=openai   # uses OPENAI_API_KEY (or =groq with GROQ_API_KEY)

After configuring, restart the daemon (/gateway stop && /gateway start) and send a voice message.

Voice replies

The way back speaks too: by default (CHATCLI_GATEWAY_VOICE_REPLY=auto) the gateway answers voice with voice — an audio message gets a voice note, text gets text — with any TTS backend, including the embedded Kokoro engine (offline, no API key). Each conversation toggles it by asking in natural language (“answer me in audio” / “stop sending audio”) via the @voice tool, with the preference persisted per session. Spoken replies are written for the ear (no emojis, lists or markdown in the audio) and wav/aiff output is transcoded to OGG/Opus when ffmpeg is present, becoming a native Telegram voice note. Modes, details and troubleshooting in Voice Replies. The gateway also treats the user’s memory index as real knowledge: personal questions (“what do you know about me?”) consult persistent memory through @memory recall before any “I don’t know”.

Cross-channel continuity

When the Conversation Hub is active (the default), the gateway shares the conversation with the chatcli on your notebook: a topic started on Telegram continues in the terminal and vice-versa. Each incoming message resolves the sender’s principal, reads recent context, and records the turn in the hub — so what you said on the notebook shows up as context on Telegram, with zero configuration (single-user mode). For real-time push to a connected CLI, run the gateway inside the server with CHATCLI_GATEWAY_IN_SERVER=true. Multi-user bots use CHATCLI_HUB_ISOLATE=true + bindings. Details in Conversation Hub.

​Architecture

​”Talking as it works” streaming and throttling

​”The assistant is working” indicator

​Concurrency

​Runtime model

​Conversational replies (not “coder tone”)

​Dynamic language (follows the sender)

​Activation

​Usage example

​Supported platforms

​Voice messages (transcription)

​Transcription backend (zero-config, local-first, keyless)

​Quick setup

​Voice replies

​Cross-channel continuity

​See also

Architecture

”Talking as it works” streaming and throttling

”The assistant is working” indicator

Concurrency

Runtime model

Conversational replies (not “coder tone”)

Dynamic language (follows the sender)

Activation

Usage example

Supported platforms

Voice messages (transcription)

Transcription backend (zero-config, local-first, keyless)

Quick setup

Voice replies

Cross-channel continuity

See also