#7 Cross-Provider Reasoning Backbone

The Reasoning Backbone unifies how ChatCLI asks “think harder” across different providers. Anthropic exposes extended thinking with thinking_budget in tokens; OpenAI o-series exposes reasoning effort as the low/medium/high enum. The pipeline abstracts this into SkillEffort and auto-attaches it to ctx before each LLM call from reasoning-heavy agents.

The cross-provider abstraction is pre-existing in llm/client/skill_hints.go. What the pipeline adds is: auto-attach policy (auto for listed agents), session override via /thinking, and exposure in /config quality.

The existing abstraction (pre-pipeline)

llm/client/skill_hints.go:

type SkillEffort string

const (
    EffortUnset  SkillEffort = ""
    EffortLow    SkillEffort = "low"
    EffortMedium SkillEffort = "medium"
    EffortHigh   SkillEffort = "high"
    EffortMax    SkillEffort = "max"
)

// Maps to Anthropic thinking budget_tokens
func ThinkingBudgetForEffort(e SkillEffort) int {
    switch e {
    case EffortMedium: return 4096
    case EffortHigh:   return 16384
    case EffortMax:    return 32768
    }
    return 0 // unset/low = no thinking
}

// Maps to OpenAI reasoning.effort
func ReasoningEffortForOpenAI(e SkillEffort) string {
    switch e {
    case EffortLow:               return "low"
    case EffortMedium:            return "medium"
    case EffortHigh, EffortMax:   return "high"
    }
    return ""
}

Providers read from ctx via client.EffortFromContext(ctx) inside their SendPrompt, translate to the native field, and send.

Auto-attach: what the pipeline adds

applyAutoReasoning(ctx, cfg ReasoningConfig, agent WorkerAgent) context.Context:

// cli/agent/quality/reasoning.go
func applyAutoReasoning(ctx, cfg, agent) context.Context {
    if cfg.Mode == "off" { return ctx }
    if client.EffortFromContext(ctx) != client.EffortUnset {
        return ctx  // respects existing hint (skill frontmatter, agent.Effort)
    }
    if cfg.Mode != "on" && !inAutoAgents(agent.Type(), cfg.AutoAgents) {
        return ctx  // auto: only for listed agents
    }
    return client.WithEffortHint(ctx, EffortForBudget(cfg.Budget))
}

EffortForBudget translates cfg.Budget (tokens) to the nearest SkillEffort tier:

Budget (tokens)	Resulting tier
`≥ 16384`	`EffortMax`
`≥ 8192`	`EffortHigh`
`≥ 4096`	`EffortMedium`
`< 4096` or `0`	`EffortHigh` (sane default)

Default CHATCLI_QUALITY_REASONING_BUDGET=8000 → EffortHigh (8000 thinking tokens on Claude, reasoning.effort=high on OpenAI).

Three modes

auto (default)
on
off

CHATCLI_QUALITY_REASONING_MODE=auto
CHATCLI_QUALITY_REASONING_AUTO_AGENTS=planner,refiner,verifier,reflexion

Effort hint is attached only for agents in AutoAgents. Mechanical agents (formatter, shell) don’t pay for pricier thinking.

CHATCLI_QUALITY_REASONING_MODE=on

Effort hint attached for every agent (that doesn’t already have its own hint). More expensive; use when latency is secondary to quality.

CHATCLI_QUALITY_REASONING_MODE=off

Pipeline never auto-attaches. Skill frontmatter (effort: high) and agent.Effort() still work — only the automatic layer is off.

Resolution priority

For an LLM call inside a worker, the effort hint resolves in this order (later wins):

Skill frontmatter

If the turn activated a skill with effort: high, that hint is already on ctx before the dispatcher.

Agent default

PlannerAgent has embedded effort="high"; dispatcher attaches via WithEffortHint.

CHATCLI_QUALITY_REASONING_*

applyAutoReasoning only attaches if (1) mode isn’t off and (2) ctx doesn’t already have an effort hint.

/thinking session override

In chat (cli_llm.go) and the orchestrator turn (agent_mode.go), cli.applyThinkingOverride(skillEffort) wins over everything above for that turn.

This means /thinking off can force zero thinking even if the agent has high default. Useful for turns where speed matters more than quality.

`/thinking` — the slash

/thinking on
# alias for /thinking high

The override lives in cli.thinkingOverride:

type thinkingOverrideState struct {
    set    bool                 // nil vs set
    effort client.SkillEffort   // EffortUnset when set=true means "off"
}

Providers that support it

Provider	Native field	Notes
Anthropic Claude (Opus 4.7+, incl. 4.8)	`thinking: {type: "adaptive"}`	Adaptive-only; budgeted thinking returns HTTP 400. The model picks per-turn whether to reason. Routed automatically via the catalog `adaptive_thinking` capability flag.
Anthropic Claude (4.x ≤ 4.6, 3.7-sonnet)	`thinking: {type: enabled, budget_tokens: N}`	Beta header `interleaved-thinking-2025-05-14`; `CHATCLI_QUALITY_REASONING_BUDGET` controls N.
OpenAI o1 / o3 / o4	`reasoning: {effort: "low\|medium\|high"}`	Via `/v1/responses` endpoint
Anthropic via Bedrock	Same shape, routed by catalog capability	Adaptive for Opus 4.7+ mirrors; budgeted for older 4.x/3.7. Supports thinking.
Other providers	Silently ignore	Fall-through without error

Why two Anthropic rows: starting with Opus 4.7, Anthropic dropped budgeted extended thinking and only accepts thinking:{type:"adaptive"} — sending budget_tokens to 4.7 / 4.8 returns HTTP 400. ChatCLI dispatches by reading the catalog’s adaptive_thinking capability flag, so the same effort: high hint gets translated to the right shape per model automatically. Adding new adaptive-only models in the future is a catalog-only change.

CHATCLI_QUALITY_REASONING_MODE=on with a provider that doesn’t support it is a no-op — ctx has the hint, provider doesn’t use it, zero failures. You only pay for real capability.

Environment variables

Env var	Default	Values	Effect
`CHATCLI_QUALITY_REASONING_MODE`	`auto`	`off\|auto\|on`	Policy
`CHATCLI_QUALITY_REASONING_BUDGET`	`8000`	int	Thinking tokens (Anthropic); mapped to tier on OpenAI
`CHATCLI_QUALITY_REASONING_AUTO_AGENTS`	`planner,refiner,verifier,reflexion`	CSV	List for mode=auto

Per-agent override

Each agent also has its own default via BuiltinAgentMeta:

# Force Planner to max thinking
export CHATCLI_AGENT_PLANNER_EFFORT=max

# Lower Formatter (it's low by default, but explicit)
export CHATCLI_AGENT_FORMATTER_EFFORT=low

The flow: dispatcher reads agent.Effort() → if non-empty, attaches via WithEffortHint. This wins over applyAutoReasoning (see priority step 2).

Interaction with skill effort hints

Skills can declare effort in frontmatter:

---
name: investigate-crashes
description: Deep dive into crash logs and bug reproduction
effort: high
---

When the skill is activated (auto or via /skill), skillEffortHint is set and propagated. The order becomes:

skill.effort=high → skillEffortHint=EffortHigh → WithEffortHint(ctx, EffortHigh)
                                                        │
                                                        ▼
                                       dispatcher propagates to workerCtx
                                                        │
                                                        ▼
                                       applyAutoReasoning detects hint already set → skip
                                                        │
                                                        ▼
                                       provider reads EffortHigh → thinking_budget=16384

Skills and reasoning backbone are orthogonal and composable. Skill says “the whole task needs high effort”; quality says “these specific agents always think”; the user can override with /thinking.

Observability

/config quality shows the state:

── Reasoning backbone (#7)
  CHATCLI_QUALITY_REASONING_MODE      : auto
  CHATCLI_QUALITY_REASONING_BUDGET    : 8000
  CHATCLI_QUALITY_REASONING_AUTO_AGENTS: planner, refiner, verifier, reflexion

In worker logs, each LLM call with active effort shows up as:

{"level":"info","msg":"SendPrompt with thinking","provider":"anthropic","model":"claude-sonnet-4-6","thinking_budget":8000}
{"level":"info","msg":"SendPrompt with reasoning","provider":"openai","model":"o4-mini","reasoning.effort":"high"}

Cost

Thinking tokens are billed separately on Anthropic (output-priced). An 8000-token budget adds ~$0.12/call with Sonnet. Reasoning effort on OpenAI also increases output tokens.

Recommended budget strategy:

Scenario	Recommendation
Casual chat	`CHATCLI_QUALITY_REASONING_MODE=off`
Daily dev	`mode=auto`, `budget=8000` (default)
Critical workflows (large refactors, debugging)	`/thinking max` on the specific turn
Batch without user	`mode=on`, `budget=16384`

Troubleshooting

Provider seems not to use thinking

Check /config quality — confirm CHATCLI_QUALITY_REASONING_MODE != off
Check CHATCLI_QUALITY_REASONING_AUTO_AGENTS includes the running agent
Check provider logs — thinking_budget should appear in the request body
For Anthropic via OAuth: needs beta header interleaved-thinking-2025-05-14 (already on in claude_client.go:46)

/thinking doesn't persist across turns

Correct! /thinking on is valid for the next turn — the flag stays until cleared with /thinking auto or /thinking off. Each /thinking replaces the previous one.

Cost exploded after enabling reasoning

The default budget (8000) is calibrated for Sonnet. For Opus or GPT-5, consider lowering: CHATCLI_QUALITY_REASONING_BUDGET=4000. Or use mode=off and trigger manually with /thinking only when it makes a difference.

Multi-Agent Orchestration

How effort hints flow from the dispatcher into parallel workers.

Skills and Registry

How skills declare effort: in frontmatter.

OpenAI Responses API

Official docs for reasoning.effort.

Anthropic Extended Thinking

Official docs for thinking_budget.

​The existing abstraction (pre-pipeline)

​Auto-attach: what the pipeline adds

​Three modes

​Resolution priority

​/thinking — the slash

​Providers that support it

​Environment variables

​Per-agent override

​Interaction with skill effort hints

​Observability

​Cost

​Troubleshooting

​See also

Multi-Agent Orchestration

Skills and Registry

OpenAI Responses API

Anthropic Extended Thinking

The existing abstraction (pre-pipeline)

Auto-attach: what the pipeline adds

Three modes

Resolution priority

`/thinking` — the slash

Providers that support it

Environment variables

Per-agent override

Interaction with skill effort hints

Observability

Cost

Troubleshooting

See also