Skip to main content
The Reasoning Backbone unifies how ChatCLI asks “think harder” across different providers. Anthropic exposes extended thinking with thinking_budget in tokens; OpenAI o-series exposes reasoning effort as the low/medium/high enum. The pipeline abstracts this into SkillEffort and auto-attaches it to ctx before each LLM call from reasoning-heavy agents.
The cross-provider abstraction is pre-existing in llm/client/skill_hints.go. What the pipeline adds is: auto-attach policy (auto for listed agents), session override via /thinking, and exposure in /config quality.

The existing abstraction (pre-pipeline)

llm/client/skill_hints.go:
type SkillEffort string

const (
    EffortUnset  SkillEffort = ""
    EffortLow    SkillEffort = "low"
    EffortMedium SkillEffort = "medium"
    EffortHigh   SkillEffort = "high"
    EffortMax    SkillEffort = "max"
)

// Maps to Anthropic thinking budget_tokens
func ThinkingBudgetForEffort(e SkillEffort) int {
    switch e {
    case EffortMedium: return 4096
    case EffortHigh:   return 16384
    case EffortMax:    return 32768
    }
    return 0 // unset/low = no thinking
}

// Maps to OpenAI reasoning.effort
func ReasoningEffortForOpenAI(e SkillEffort) string {
    switch e {
    case EffortLow:               return "low"
    case EffortMedium:            return "medium"
    case EffortHigh, EffortMax:   return "high"
    }
    return ""
}
Providers read from ctx via client.EffortFromContext(ctx) inside their SendPrompt, translate to the native field, and send.

Auto-attach: what the pipeline adds

applyAutoReasoning(ctx, cfg ReasoningConfig, agent WorkerAgent) context.Context:
// cli/agent/quality/reasoning.go
func applyAutoReasoning(ctx, cfg, agent) context.Context {
    if cfg.Mode == "off" { return ctx }
    if client.EffortFromContext(ctx) != client.EffortUnset {
        return ctx  // respects existing hint (skill frontmatter, agent.Effort)
    }
    if cfg.Mode != "on" && !inAutoAgents(agent.Type(), cfg.AutoAgents) {
        return ctx  // auto: only for listed agents
    }
    return client.WithEffortHint(ctx, EffortForBudget(cfg.Budget))
}
EffortForBudget translates cfg.Budget (tokens) to the nearest SkillEffort tier:
Budget (tokens)Resulting tier
≥ 16384EffortMax
≥ 8192EffortHigh
≥ 4096EffortMedium
< 4096 or 0EffortHigh (sane default)
Default CHATCLI_QUALITY_REASONING_BUDGET=8000EffortHigh (8000 thinking tokens on Claude, reasoning.effort=high on OpenAI).

Three modes

CHATCLI_QUALITY_REASONING_MODE=auto
CHATCLI_QUALITY_REASONING_AUTO_AGENTS=planner,refiner,verifier,reflexion
Effort hint is attached only for agents in AutoAgents. Mechanical agents (formatter, shell) don’t pay for pricier thinking.

Resolution priority

For an LLM call inside a worker, the effort hint resolves in this order (later wins):
1

Skill frontmatter

If the turn activated a skill with effort: high, that hint is already on ctx before the dispatcher.
2

Agent default

PlannerAgent has embedded effort="high"; dispatcher attaches via WithEffortHint.
3

CHATCLI_QUALITY_REASONING_*

applyAutoReasoning only attaches if (1) mode isn’t off and (2) ctx doesn’t already have an effort hint.
4

/thinking session override

In chat (cli_llm.go) and the orchestrator turn (agent_mode.go), cli.applyThinkingOverride(skillEffort) wins over everything above for that turn.
This means /thinking off can force zero thinking even if the agent has high default. Useful for turns where speed matters more than quality.

/thinking — the slash

/thinking on
# alias for /thinking high
The override lives in cli.thinkingOverride:
type thinkingOverrideState struct {
    set    bool                 // nil vs set
    effort client.SkillEffort   // EffortUnset when set=true means "off"
}

Providers that support it

ProviderNative fieldNotes
Anthropic Claudethinking: {type: enabled, budget_tokens: N}Beta header interleaved-thinking-2025-05-14
OpenAI o1 / o3 / o4reasoning: {effort: "low|medium|high"}Via /v1/responses endpoint
Anthropic via BedrockSame shape, but via AWS APISupports thinking
Other providersSilently ignoreFall-through without error
CHATCLI_QUALITY_REASONING_MODE=on with a provider that doesn’t support it is a no-op — ctx has the hint, provider doesn’t use it, zero failures. You only pay for real capability.

Environment variables

Env varDefaultValuesEffect
CHATCLI_QUALITY_REASONING_MODEautooff|auto|onPolicy
CHATCLI_QUALITY_REASONING_BUDGET8000intThinking tokens (Anthropic); mapped to tier on OpenAI
CHATCLI_QUALITY_REASONING_AUTO_AGENTSplanner,refiner,verifier,reflexionCSVList for mode=auto

Per-agent override

Each agent also has its own default via BuiltinAgentMeta:
# Force Planner to max thinking
export CHATCLI_AGENT_PLANNER_EFFORT=max

# Lower Formatter (it's low by default, but explicit)
export CHATCLI_AGENT_FORMATTER_EFFORT=low
The flow: dispatcher reads agent.Effort() → if non-empty, attaches via WithEffortHint. This wins over applyAutoReasoning (see priority step 2).

Interaction with skill effort hints

Skills can declare effort in frontmatter:
---
name: investigate-crashes
description: Deep dive into crash logs and bug reproduction
effort: high
---
When the skill is activated (auto or via /skill), skillEffortHint is set and propagated. The order becomes:
skill.effort=high → skillEffortHint=EffortHigh → WithEffortHint(ctx, EffortHigh)


                                       dispatcher propagates to workerCtx


                                       applyAutoReasoning detects hint already set → skip


                                       provider reads EffortHigh → thinking_budget=16384
Skills and reasoning backbone are orthogonal and composable. Skill says “the whole task needs high effort”; quality says “these specific agents always think”; the user can override with /thinking.

Observability

/config quality shows the state:
── Reasoning backbone (#7)
  CHATCLI_QUALITY_REASONING_MODE      : auto
  CHATCLI_QUALITY_REASONING_BUDGET    : 8000
  CHATCLI_QUALITY_REASONING_AUTO_AGENTS: planner, refiner, verifier, reflexion
In worker logs, each LLM call with active effort shows up as:
{"level":"info","msg":"SendPrompt with thinking","provider":"anthropic","model":"claude-sonnet-4-6","thinking_budget":8000}
{"level":"info","msg":"SendPrompt with reasoning","provider":"openai","model":"o4-mini","reasoning.effort":"high"}

Cost

Thinking tokens are billed separately on Anthropic (output-priced). An 8000-token budget adds ~$0.12/call with Sonnet. Reasoning effort on OpenAI also increases output tokens.
Recommended budget strategy:
ScenarioRecommendation
Casual chatCHATCLI_QUALITY_REASONING_MODE=off
Daily devmode=auto, budget=8000 (default)
Critical workflows (large refactors, debugging)/thinking max on the specific turn
Batch without usermode=on, budget=16384

Troubleshooting

  1. Check /config quality — confirm CHATCLI_QUALITY_REASONING_MODE != off
  2. Check CHATCLI_QUALITY_REASONING_AUTO_AGENTS includes the running agent
  3. Check provider logs — thinking_budget should appear in the request body
  4. For Anthropic via OAuth: needs beta header interleaved-thinking-2025-05-14 (already on in claude_client.go:46)
Correct! /thinking on is valid for the next turn — the flag stays until cleared with /thinking auto or /thinking off. Each /thinking replaces the previous one.
The default budget (8000) is calibrated for Sonnet. For Opus or GPT-5, consider lowering: CHATCLI_QUALITY_REASONING_BUDGET=4000. Or use mode=off and trigger manually with /thinking only when it makes a difference.

See also

Multi-Agent Orchestration

How effort hints flow from the dispatcher into parallel workers.

Skills and Registry

How skills declare effort: in frontmatter.

OpenAI Responses API

Official docs for reasoning.effort.

Anthropic Extended Thinking

Official docs for thinking_budget.