Context Overflow Recovery
When the API returns a βcontext too longβ error, ChatCLI applies up to 3 recovery levels before giving up:- Level 1: Aggressive Budget
- Level 2: Emergency Truncation
- Level 3: Nuclear Truncation
First attempt: halves the budget limits and cleans up misalignments.Actions:
- Repairs tool result pairing (removes orphans, injects synthetics)
- Reduces
DefaultTurnBudgetCharsandDefaultPerResultMaxCharsto 50% of their original values - Applies budget enforcement with reduced limits
- Truncates long assistant messages to 5,000 chars
The original limits are restored after application. Only the current history is affected by the reduction.
Error Detection β model overflow
The system recognizes multiple forms of overflow errors:| Error Message | Provider |
|---|---|
context length exceeded | Anthropic |
prompt is too long | OpenAI |
request too large | Various |
max_tokens exceed | Various |
input too long | |
token limit | Generic |
Corporate proxy / gateway recovery
Enterprise environments often sit behind a proxy or gateway that enforces a POST body size cap β typically 1-5 MB, completely independent of the modelβs context window. You can be well within Anthropicβs 200K-token window (~800 KB) and still take a mysterious rejection from the proxy. Worse: many proxies donβt return a clean 413 β some send a WAF 403 (Cloudflare, Akamai, mod_security), 431 (header too large), or simply drop the TCP connection mid-POST, surfacing asEOF / connection reset on the client.
ChatCLI detects all three patterns and funnels them through the same recovery flow as context overflow.
Error Detection β proxy/gateway
| Pattern detected | Example | Function |
|---|---|---|
| HTTP 413 + variants | 413 Payload Too Large, request entity too large, body too large, maximum request size, 431 Request Header Fields Too Large | IsPayloadTooLargeError |
| 403 with WAF/firewall signals | 403 with firewall, waf, security policy, blocked by, cloudflare, cf-ray, mod_security, akamai, proxy denied, policy violation | IsProxyWAFRejection |
| 403 with HTML body (Bedrock/AWS SDK) | StatusCode: 403 ... deserialization failed ... invalid character '<' looking for beginning of value (SDK trying to decode an HTML proxy block page as JSON) | IsProxyWAFRejection |
| EOF / reset with large history | unexpected eof, connection reset, broken pipe, stream error and history > 500 KB | IsLikelyPayloadProblem (heuristic) |
WAF detection is conservative β a 403 without firewall signals continues to be treated as an auth error (OAuth refresh + retry). Only when a 403 carries specific proxy/WAF signals is it reclassified as a recoverable payload failure. This prevents invalidating valid OAuth credentials when the real problem is on the network layer.
The corporate Bedrock case: when the proxy/WAF intercepts the POST to Bedrock Runtime and returns an HTML block page with status 403, the AWS SDK tries to parse the body as JSON and fails with
"invalid character '<' looking for beginning of value" and an empty RequestID. That pattern is an unambiguous middlebox fingerprint (a real AWS 403 returns well-formed JSON) β ChatCLI reclassifies it as a recoverable payload failure and triggers the same recovery ladder.EOF / connection-reset detection applies a history-size threshold (500 KB) before suspecting payload. Small requests that hit EOF keep being treated as transient network failures (normal retry). Only when the history is already suspiciously large is EOF reclassified as a probable body cap.
Pre-flight check
Every agent turn, history is measured before the request goes out. Two paths: WithCHATCLI_MAX_PAYLOAD set:
If history crosses 85% of the cap, BudgetRatio is forced to 0.40 up front β aggressive preventive compaction. The user sees:
Reactive auto-cap after 413
If a 413/WAF/EOF fires and the user hasnβt configuredCHATCLI_MAX_PAYLOAD, ChatCLI automatically assumes 4 MB for the rest of the session β a high probability that the retry passes through the same proxy:
System notice injected in history
After a payload-limit-triggered recovery, ChatCLI injects auser message before the retry instructing the model to prefer smaller reads going forward. This breaks the modelβs loop of trying to re-read the same huge file that caused the 413 in the first place:
Max Output Token Escalation
When the model stops generating because it hit themax_tokens limit, ChatCLI can automatically escalate:
| Attempt | Action |
|---|---|
| 1st | Doubles the current max_tokens (up to the providerβs cap) |
| 2nd | Doubles again (up to the providerβs cap) |
| 3rd+ | Stops escalating, returns partial content |
Continuation Message
When the model is interrupted by a token limit, ChatCLI injects a continuation message:Configuration
| Environment Variable | Description | Default |
|---|---|---|
CHATCLI_CONTEXT_WINDOW | Global context-window override (in tokens), for any provider/model. Takes precedence over the catalog. Use it when your gateway/agentβs real window differs from what ChatCLI assumes β the compaction budget derives from this value. | (auto from catalog) |
CHATCLI_MAX_RECOVERY_ATTEMPTS | Maximum context recovery attempts | 3 |
CHATCLI_MAX_TOKEN_ESCALATIONS | Maximum max_tokens escalations | 2 |
CHATCLI_EMERGENCY_KEEP_MESSAGES | Messages kept in emergency truncation | 10 |
CHATCLI_MAX_PAYLOAD | Human-friendly ceiling for POST body size (e.g. 5MB, 512KB, 2.5MB, 5=5MB). When set, the compactor respects this ceiling as an extra cap on top of the modelβs context window, and pre-flight forces compaction on crossing 85% of it. | (unset β no cap) |
Live feedback during compaction
Since this release, the terminal never βfreezesβ during a long compaction anymore.HistoryCompactor emits status at each pipeline phase via SetStatusCallback:
Ctrl+C / ESC propagates correctly and aborts compaction without corrupting history (returns ctx.Err() instead of blindly falling through to emergency truncation).
Microcompact (pre-budget)
BeforeNeedsCompaction checks whether history exceeds budget, the agent loop applies ApplyMicrocompact β a pure-Go, no-LLM, no-network pass that progressively truncates/summarizes old tool results (2+ turns old β head+tail preview; 4+ turns old β one-line summary). In most cases this keeps history inside budget without triggering the (expensive) Level 2.
| Environment Variable | Description | Default |
|---|---|---|
CHATCLI_MICROCOMPACT_TRUNCATE_TURNS | Age (in turns) at which tool results start getting truncated | 2 |
CHATCLI_MICROCOMPACT_SUMMARIZE_TURNS | Age (in turns) at which tool results are replaced by a one-line summary | 4 |
Aggressive Budget Ratio
At level 1, the tool result budget limits are multiplied by0.5 (50%). This means:
| Parameter | Normal | Level 1 Recovery |
|---|---|---|
| Budget per turn | 200,000 chars | 100,000 chars |
| Max per result | 20,000 chars | 10,000 chars |
Recovery Flow
Interaction with Other Systems
Context recovery works in conjunction with:Tool Result Budget
The result budget is the first line of defense. Recovery activates when the budget was not sufficient.
Microcompaction
Progressive compaction reduces context growth over time.
Conversation Control
The
/compact command is the proactive way to prevent overflow.Cost Tracking
Monitor context usage to anticipate when /compact will be needed.