Skip to main content
ChatCLI supports real-time streaming of LLM responses, displaying text character-by-character as it is generated by the API. This significantly improves the user experience by eliminating the wait for complete responses.

StreamingClient Interface

Streaming is implemented as an optional interface that providers can adopt:
type StreamingClient interface {
    LLMClient

    SendPromptStream(ctx context.Context, prompt string,
        history []models.Message, maxTokens int) (<-chan StreamChunk, error)

    SupportsStreaming() bool
}
Detection is automatic via type assertion — providers that implement StreamingClient receive streaming automatically:
if sc, ok := client.AsStreamingClient(c); ok {
    chunks, err := sc.SendPromptStream(ctx, prompt, history, maxTokens)
    // process chunks in real time
}
Providers that do not implement StreamingClient continue to work normally. ChatCLI falls back to SendPrompt (complete response) automatically.

StreamChunk

Each streaming chunk carries:
FieldTypeDescription
TextstringIncremental text in this chunk (may be empty)
Donebooltrue on the final chunk
Usage*UsageInfoToken usage data (only on the final chunk)
StopReasonstringStop reason: end_turn, max_tokens, tool_use
ErrorerrorError during streaming (terminates the stream)

Streaming Contract

  • The channel returns zero or more text chunks
  • The final chunk has Done=true and may include Usage and StopReason
  • If an error occurs, a chunk with Error is sent and the channel closes
  • The channel closes after the final chunk or error
  • The caller can cancel via context

Supported Providers

ProviderStreamingNotes
Anthropic (API Key)YesNative streaming via Messages API
Anthropic (OAuth)YesStreaming via OAuth token
OpenAIYesStreaming via Chat Completions
ZAI (Zhipu AI)YesOpenAI-compatible streaming
MiniMaxYesOpenAI-compatible streaming
OpenRouterYesStreaming via OpenAI-compatible API
Google (Gemini)NoFallback to complete response
xAI (Grok)NoFallback to complete response
GitHub ModelsNoFallback to complete response
OllamaNoFallback to complete response

Stream Watchdog

The Stream Watchdog monitors the stream to detect stalls (interruptions without data) and prevent ChatCLI from hanging indefinitely:
TimerDurationAction
Warning45 secondsLogs a stall warning
Idle Timeout90 secondsAborts stream and returns partial content
Both timers are reset on each received chunk. If the provider stops sending data for 90 seconds, the watchdog interrupts the stream and returns the accumulated text.

Watchdog Configuration

Environment VariableDescriptionDefault
CHATCLI_STREAM_IDLE_TIMEOUT_SECONDSIdle timeout in seconds90
On slow networks or with providers that have high latency between chunks, increase the timeout to avoid premature interruptions. The default of 90 seconds is sufficient for most scenarios.

Fallback to Non-Streaming

When streaming is not available (provider does not support it or connection error), ChatCLI falls back automatically:
1. Tries SendPromptStream()  -> real-time streaming
2. If not supported -> fallback to SendPrompt()
3. Complete response displayed at once
The DrainStream function allows converting a stream into a complete response when needed:
text, usage, stopReason, err := client.DrainStream(chunks)

TUI Integration

In interactive mode (Bubble Tea), streaming integrates directly with the renderer:
  • Each chunk is emitted as an event via TUIEmitter
  • The Bubble Tea model updates the view incrementally
  • Markdown is rendered progressively via Glamour
  • The status bar shows the streaming state in real time
In one-shot mode (-p), streaming is disabled and DrainStream is used to collect the complete response before printing.

Next Steps

Context Recovery

What happens when max_tokens is reached during streaming.

Provider Fallback

Fallback chain between providers with and without streaming.

Native Tool Use

Streaming with native tool calls.

Progress UI

Visual indicators during agent streaming.