thinking_budget in tokens; OpenAI o-series exposes reasoning effort as the low/medium/high enum. The pipeline abstracts this into SkillEffort and auto-attaches it to ctx before each LLM call from reasoning-heavy agents.
The cross-provider abstraction is pre-existing in
llm/client/skill_hints.go. What the pipeline adds is: auto-attach policy (auto for listed agents), session override via /thinking, and exposure in /config quality.The existing abstraction (pre-pipeline)
llm/client/skill_hints.go:
client.EffortFromContext(ctx) inside their SendPrompt, translate to the native field, and send.
Auto-attach: what the pipeline adds
applyAutoReasoning(ctx, cfg ReasoningConfig, agent WorkerAgent) context.Context:
cfg.Budget (tokens) to the nearest SkillEffort tier:
| Budget (tokens) | Resulting tier |
|---|---|
≥ 16384 | EffortMax |
≥ 8192 | EffortHigh |
≥ 4096 | EffortMedium |
< 4096 or 0 | EffortHigh (sane default) |
Three modes
- auto (default)
- on
- off
AutoAgents. Mechanical agents (formatter, shell) don’t pay for pricier thinking.Resolution priority
For an LLM call inside a worker, the effort hint resolves in this order (later wins):Skill frontmatter
If the turn activated a skill with
effort: high, that hint is already on ctx before the dispatcher.CHATCLI_QUALITY_REASONING_*
applyAutoReasoning only attaches if (1) mode isn’t off and (2) ctx doesn’t already have an effort hint./thinking off can force zero thinking even if the agent has high default. Useful for turns where speed matters more than quality.
/thinking — the slash
cli.thinkingOverride:
Providers that support it
| Provider | Native field | Notes |
|---|---|---|
| Anthropic Claude | thinking: {type: enabled, budget_tokens: N} | Beta header interleaved-thinking-2025-05-14 |
| OpenAI o1 / o3 / o4 | reasoning: {effort: "low|medium|high"} | Via /v1/responses endpoint |
| Anthropic via Bedrock | Same shape, but via AWS API | Supports thinking |
| Other providers | Silently ignore | Fall-through without error |
Environment variables
| Env var | Default | Values | Effect |
|---|---|---|---|
CHATCLI_QUALITY_REASONING_MODE | auto | off|auto|on | Policy |
CHATCLI_QUALITY_REASONING_BUDGET | 8000 | int | Thinking tokens (Anthropic); mapped to tier on OpenAI |
CHATCLI_QUALITY_REASONING_AUTO_AGENTS | planner,refiner,verifier,reflexion | CSV | List for mode=auto |
Per-agent override
Each agent also has its own default viaBuiltinAgentMeta:
agent.Effort() → if non-empty, attaches via WithEffortHint. This wins over applyAutoReasoning (see priority step 2).
Interaction with skill effort hints
Skills can declare effort in frontmatter:/skill), skillEffortHint is set and propagated. The order becomes:
Skills and reasoning backbone are orthogonal and composable. Skill says “the whole task needs high effort”; quality says “these specific agents always think”; the user can override with
/thinking.Observability
/config quality shows the state:
Cost
Recommended budget strategy:| Scenario | Recommendation |
|---|---|
| Casual chat | CHATCLI_QUALITY_REASONING_MODE=off |
| Daily dev | mode=auto, budget=8000 (default) |
| Critical workflows (large refactors, debugging) | /thinking max on the specific turn |
| Batch without user | mode=on, budget=16384 |
Troubleshooting
Provider seems not to use thinking
Provider seems not to use thinking
- Check
/config quality— confirmCHATCLI_QUALITY_REASONING_MODE != off - Check
CHATCLI_QUALITY_REASONING_AUTO_AGENTSincludes the running agent - Check provider logs —
thinking_budgetshould appear in the request body - For Anthropic via OAuth: needs beta header
interleaved-thinking-2025-05-14(already on inclaude_client.go:46)
/thinking doesn't persist across turns
/thinking doesn't persist across turns
Correct!
/thinking on is valid for the next turn — the flag stays until cleared with /thinking auto or /thinking off. Each /thinking replaces the previous one.Cost exploded after enabling reasoning
Cost exploded after enabling reasoning
The default budget (8000) is calibrated for Sonnet. For Opus or GPT-5, consider lowering:
CHATCLI_QUALITY_REASONING_BUDGET=4000. Or use mode=off and trigger manually with /thinking only when it makes a difference.See also
Multi-Agent Orchestration
How effort hints flow from the dispatcher into parallel workers.
Skills and Registry
How skills declare
effort: in frontmatter.OpenAI Responses API
Official docs for reasoning.effort.
Anthropic Extended Thinking
Official docs for thinking_budget.