Skip to main content
ChatCLI implements an automatic context recovery system that handles two common failure types in long sessions: context window overflow (“prompt too long”) and output token limits. When the API rejects a request due to excessive size, the system applies progressively more aggressive strategies to recover the session without losing the conversation.

Context Overflow Recovery

When the API returns a “context too long” error, ChatCLI applies up to 3 recovery levels before giving up:
First attempt: halves the budget limits and cleans up misalignments.Actions:
  • Repairs tool result pairing (removes orphans, injects synthetics)
  • Reduces DefaultTurnBudgetChars and DefaultPerResultMaxChars to 50% of their original values
  • Applies budget enforcement with reduced limits
  • Truncates long assistant messages to 5,000 chars
The original limits are restored after application. Only the current history is affected by the reduction.

Error Detection

The system recognizes multiple forms of overflow errors:
Error MessageProvider
context length exceededAnthropic
prompt is too longOpenAI
request too largeVarious
max_tokens exceedVarious
input too longGoogle
token limitGeneric

Max Output Token Escalation

When the model stops generating because it hit the max_tokens limit, ChatCLI can automatically escalate:
AttemptAction
1stDoubles the current max_tokens (up to the provider’s cap)
2ndDoubles again (up to the provider’s cap)
3rd+Stops escalating, returns partial content

Continuation Message

When the model is interrupted by a token limit, ChatCLI injects a continuation message:
Your response was cut off at the token limit.
Resume DIRECTLY from where you stopped -- do not repeat any content.
Continue the implementation or explanation from the exact point of interruption.
The message instructs the model to continue from where it left off, avoiding repetition of already generated content.

Configuration

Environment VariableDescriptionDefault
CHATCLI_MAX_RECOVERY_ATTEMPTSMaximum context recovery attempts3
CHATCLI_MAX_TOKEN_ESCALATIONSMaximum max_tokens escalations2
CHATCLI_EMERGENCY_KEEP_MESSAGESMessages kept in emergency truncation10

Aggressive Budget Ratio

At level 1, the tool result budget limits are multiplied by 0.5 (50%). This means:
ParameterNormalLevel 1 Recovery
Budget per turn200,000 chars100,000 chars
Max per result20,000 chars10,000 chars

Recovery Flow

API returns "context too long" error
  |
  +- Attempt 1: Aggressive budget (50%) + pairing cleanup
  |   +- Resend to API
  |       +- Success -> continues normally
  |       +- Failure -> next attempt
  |
  +- Attempt 2: Emergency truncate (system + last 10 msgs)
  |   +- Resend to API
  |       +- Success -> continues with reduced history
  |       +- Failure -> next attempt
  |
  +- Attempt 3: Nuclear truncate (system + last 4 msgs)
      +- Resend to API
          +- Success -> continues with minimal history
          +- Failure -> error reported to user
After nuclear truncation (level 3), the model loses all context from the previous conversation. Only the last 2 exchanges are kept. Use /compact proactively to avoid reaching this point.

Interaction with Other Systems

Context recovery works in conjunction with:

Tool Result Budget

The result budget is the first line of defense. Recovery activates when the budget was not sufficient.

Microcompaction

Progressive compaction reduces context growth over time.

Conversation Control

The /compact command is the proactive way to prevent overflow.

Cost Tracking

Monitor context usage to anticipate when /compact will be needed.