Real-Time Streaming

ChatCLI supports real-time streaming of LLM responses, displaying text character-by-character as it is generated by the API. This significantly improves the user experience by eliminating the wait for complete responses.

StreamingClient Interface

Streaming is implemented as an optional interface that providers can adopt:

type StreamingClient interface {
    LLMClient

    SendPromptStream(ctx context.Context, prompt string,
        history []models.Message, maxTokens int) (<-chan StreamChunk, error)

    SupportsStreaming() bool
}

Detection is automatic via type assertion — providers that implement StreamingClient receive streaming automatically:

if sc, ok := client.AsStreamingClient(c); ok {
    chunks, err := sc.SendPromptStream(ctx, prompt, history, maxTokens)
    // process chunks in real time
}

Providers that do not implement StreamingClient continue to work normally. ChatCLI falls back to SendPrompt (complete response) automatically.

StreamChunk

Each streaming chunk carries:

Field	Type	Description
`Text`	string	Incremental text in this chunk (may be empty)
`Done`	bool	`true` on the final chunk
`Usage`	*UsageInfo	Token usage data (only on the final chunk)
`StopReason`	string	Stop reason: `end_turn`, `max_tokens`, `tool_use`
`Error`	error	Error during streaming (terminates the stream)

Streaming Contract

The channel returns zero or more text chunks
The final chunk has Done=true and may include Usage and StopReason
If an error occurs, a chunk with Error is sent and the channel closes
The channel closes after the final chunk or error
The caller can cancel via context

Supported Providers

Provider	Streaming	Notes
Anthropic (API Key)	Yes	Native streaming via Messages API
Anthropic (OAuth)	Yes	Streaming via OAuth token
OpenAI	Yes	Streaming via Chat Completions
ZAI (Zhipu AI)	Yes	OpenAI-compatible streaming
MiniMax	Yes	OpenAI-compatible streaming
Moonshot (Kimi)	Yes	OpenAI-compatible streaming
OpenRouter	Yes	Streaming via OpenAI-compatible API
Google (Gemini)	No	Fallback to complete response
xAI (Grok)	No	Fallback to complete response
GitHub Models	No	Fallback to complete response
Ollama	No	Fallback to complete response

Stream Watchdog

The Stream Watchdog monitors the stream to detect stalls (interruptions without data) and prevent ChatCLI from hanging indefinitely:

Timeouts
Result

Timer	Duration	Action
Warning	45 seconds	Logs a stall warning
Idle Timeout	90 seconds	Aborts stream and returns partial content

Both timers are reset on each received chunk. If the provider stops sending data for 90 seconds, the watchdog interrupts the stream and returns the accumulated text.

The watchdog returns a WatchdogResult:

Field	Description
`Text`	Text accumulated so far
`Usage`	Usage data (if stream completed)
`StopReason`	Stop reason
`WasStalled`	`true` if the watchdog triggered due to timeout
`StallCount`	Number of stalls detected during the stream

Watchdog Configuration

Environment Variable	Description	Default
`CHATCLI_STREAM_IDLE_TIMEOUT_SECONDS`	Idle timeout in seconds	90

On slow networks or with providers that have high latency between chunks, increase the timeout to avoid premature interruptions. The default of 90 seconds is sufficient for most scenarios.

Fallback to Non-Streaming

When streaming is not available (provider does not support it or connection error), ChatCLI falls back automatically:

Tries SendPromptStream()  -> real-time streaming
If not supported -> fallback to SendPrompt()
Complete response displayed at once

The DrainStream function allows converting a stream into a complete response when needed:

text, usage, stopReason, err := client.DrainStream(chunks)

TUI Integration

In interactive mode (Bubble Tea), streaming integrates directly with the renderer:

Each chunk is emitted as an event via TUIEmitter
The Bubble Tea model updates the view incrementally
Markdown is rendered progressively via Glamour
The status bar shows the streaming state in real time

In one-shot mode (-p), streaming is disabled and DrainStream is used to collect the complete response before printing.

Next Steps

Context Recovery

What happens when max_tokens is reached during streaming.

Provider Fallback

Fallback chain between providers with and without streaming.

Native Tool Use

Streaming with native tool calls.

Progress UI

Visual indicators during agent streaming.

​StreamingClient Interface

​StreamChunk

​Streaming Contract

​Supported Providers

​Stream Watchdog

​Watchdog Configuration

​Fallback to Non-Streaming

​TUI Integration

​Next Steps

Context Recovery

Provider Fallback

Native Tool Use

Progress UI

StreamingClient Interface

StreamChunk

Streaming Contract

Supported Providers

Stream Watchdog

Watchdog Configuration

Fallback to Non-Streaming

TUI Integration

Next Steps