/coder and /agent into an orchestration system where the LLM dispatches specialist agents in parallel to solve complex tasks faster, cheaper, and more accurately. Each agent has its own expertise, its own skills, and β since the recent update β its own preferred model and effort level.
Activation
Multi-agent mode is enabled by default. To disable it:/coder and /agent work exactly like before β zero impact.Architecture
<agent_call> tags:
Two Execution Modes
The orchestrator has two execution mechanisms, choosing the best for each context:| Mode | Syntax | When to Use |
|---|---|---|
| agent_call | <agent_call agent="..." task="..." /> | New work phases, parallel tasks, exploratory reading, multi-file refactoring |
| tool_call | <tool_call name="@coder" args="..." /> | Quick fixes, error diagnosis, pinpoint patches, post-agent validation. IMPORTANT: multiple independent tool_calls should be emitted in a SINGLE response |
Decision Guide
| Situation | Mode |
|---|---|
| Read multiple files + find references | agent_call (file + search in parallel) |
| Fix a compile error | tool_call (direct patch) |
| Write new module + tests | agent_call (coder + shell) |
| Check an agentβs result | tool_call (quick read/exec) |
| Fix after an agent failure | tool_call (precise diagnosis) |
| Resume after fix applied | agent_call (next phase) |
Built-in Specialist Agents
The 12 built-in agents implement theWorkerAgent interface and embed BuiltinAgentMeta, which declares the default model and effort and reads env var overrides (CHATCLI_AGENT_<NAME>_MODEL / CHATCLI_AGENT_<NAME>_EFFORT).
Per-Agent Effort Strategy
Each built-in has an effort level calibrated for the type of work it does. This saves tokens on mechanical tasks and guarantees quality on tasks that require deep reasoning.| Agent | Default Effort | Rationale |
|---|---|---|
| File | low | Batch reads, no reasoning needed |
| Coder | medium | Safe diffs benefit from some thinking |
| Shell | low | Mechanical command execution |
| Git | low | Git operations are deterministic |
| Search | low | Mechanical grep/tree/read |
| Planner | high | Decomposition is where value lives (pure reasoning) |
| Reviewer | high | Finding subtle bugs requires deep reasoning |
| Tester | medium | Generates boilerplate + some semantics |
| Refactor | high | Rename/extract needs a reference model |
| Diagnostics | high | Root-cause analysis is pure reasoning |
| Formatter | low | Tool-driven, mechanical |
| Deps | low | Tool output interpretation |
model empty β they respect the userβs /switch choice by default. This ensures the user controls costs and isnβt surprised by a built-in silently swapping models.
Environment Variable Override
To force a different model or effort level on a built-in agent without recompiling, set the env vars:FILE, CODER, SHELL, GIT, SEARCH, PLANNER, REVIEWER, TESTER, REFACTOR, DIAGNOSTICS, FORMATTER, DEPS).
The 12 Agents and Their Skills
FileAgent (Read and Analysis)
FileAgent (Read and Analysis)
read, tree, search)
Default effort: lowSkills:batch-readβ Accelerator script: reads N files in parallel goroutines without calling the LLMfind-patternβ Search patterns in filesanalyze-structureβ Analyze code structuremap-depsβ Map dependencies between modules
CoderAgent (Write and Modify)
CoderAgent (Write and Modify)
write, patch, read, tree)
Default effort: mediumSkills:write-fileβ Create new filespatch-fileβ Precise modification of existing codecreate-moduleβ Boilerplate generationrefactorβ Safe rename and refactor
ShellAgent (Execution and Tests)
ShellAgent (Execution and Tests)
exec, test)
Default effort: lowSkills:run-testsβ Accelerator script: runsgo test ./... -jsonand parses resultsbuild-checkβ Accelerator script: runsgo build ./... && go vet ./...lint-fixβ Automatic lint correction
GitAgent (Version Control)
GitAgent (Version Control)
git-status, git-diff, git-log, git-changed, git-branch, exec)
Default effort: lowSkills:smart-commitβ Accelerator script: collects status + diff for smart commitreview-changesβ Accelerator script: analyzes changes via changed + diff + logcreate-branchβ Branch creation
SearchAgent (Codebase Search)
SearchAgent (Codebase Search)
search, tree, read)
Default effort: lowSkills:find-usagesβ Find symbol usagesfind-definitionβ Find definitionsfind-dead-codeβ Detect dead codemap-projectβ Accelerator script: maps project in parallel (tree + interfaces + structs + funcs)
PlannerAgent (Pure Reasoning)
PlannerAgent (Pure Reasoning)
highSkills:analyze-taskβ Complexity and risk analysiscreate-planβ Execution plan creationdecomposeβ Decompose complex tasks
ReviewerAgent (Code Review and Quality)
ReviewerAgent (Code Review and Quality)
read, search, tree)
Default effort: highSkills:review-fileβ Analyzes a file for bugs, code smells, SOLID violations, and security issuesdiff-reviewβ Accelerator script: reviews staged changes via git-diff and git-changedscan-lintβ Accelerator script: runsgo vetandstaticcheckand categorizes issues
TesterAgent (Tests and Coverage)
TesterAgent (Tests and Coverage)
read, write, patch, exec, test, search, tree)
Default effort: mediumSkills:generate-testsβ Generates comprehensive tests for functions and packages (LLM-driven)run-coverageβ Accelerator script: runsgo test -coverprofileand parses per-function coveragefind-untestedβ Accelerator script: finds exported functions without corresponding testsgenerate-table-testβ Generates idiomatic Go table-driven tests
RefactorAgent (Structural Transformations)
RefactorAgent (Structural Transformations)
read, write, patch, search, tree)
Default effort: highSkills:rename-symbolβ Accelerator script: renames symbol across all.gofiles, ignoring strings and commentsextract-interfaceβ Extracts an interface from a concrete typeβs methodsmove-functionβ Moves a function between packages adjusting importsinline-variableβ Replaces a variable with its value at all use sites
DiagnosticsAgent (Troubleshooting and Investigation)
DiagnosticsAgent (Troubleshooting and Investigation)
read, search, tree, exec)
Default effort: highSkills:analyze-errorβ Parses error messages and stack traces mapping to code locationscheck-depsβ Accelerator script: runsgo mod tidy,go mod verifyand checks dependency healthbisect-bugβ Guides investigation to find the commit that introduced a bugprofile-bottleneckβ Runs benchmarks or pprof and analyzes performance hotspots
FormatterAgent (Formatting and Style)
FormatterAgent (Formatting and Style)
read, patch, exec, tree)
Default effort: lowSkills:format-codeβ Accelerator script: runsgofmt -w(orgoimports -w) on Go filesfix-importsβ Accelerator script: runsgoimportsto organize importsnormalize-styleβ Applies consistent naming and style conventions (LLM-driven)
DepsAgent (Dependency Management)
DepsAgent (Dependency Management)
read, exec, search, tree)
Default effort: lowSkills:audit-depsβ Accelerator script: runsgo mod verifyandgovulncheckfor auditingupdate-depsβ Accelerator script: lists outdated deps with available updates (dry-run)why-depβ Accelerator script: explains why a dep exists viago mod whyandgo mod graphfind-outdatedβ Finds all deps with newer versions available
Orchestrator-Visible Catalog
The catalog the orchestrator LLM receives in its system prompt (viaregistry.CatalogString()) now includes each agentβs LLM profile when it declares non-default preferences. This helps the LLM make informed decisions β e.g., prefer planner for deep decomposition and formatter for cheap mechanical work.
Example of what the orchestrator sees:
Effort() and Model() return empty strings, no line is added (avoids prompt noise).
Custom Agents as Workers
Persona agents defined in~/.chatcli/agents/ are automatically loaded as workers in the orchestration system when starting /coder or /agent. The LLM can dispatch them via <agent_call> with the same ReAct loop, parallel reading, and error recovery as built-in agents.
Full Parity with Skills
Custom agents now have the same preference fields as skills:claude-opus-4-6 with effort=high β even if the user is on Sonnet. When the worker finishes, the userβs next turn returns to the original model.
How It Works
Scan
~/.chatcli/agents/ (global) and ./.agent/agents/ (project).CustomAgent creation
CustomAgent is created, implementing the WorkerAgent interface. Model() and Effort() come directly from the frontmatter.Catalog registration
Tools Mapping
Thetools field in YAML frontmatter maps Claude Code-style tools to @coder subcommands:
| Tool in YAML | @coder Command(s) | Description |
|---|---|---|
Read | read | Read file contents |
Grep | search | Search patterns in files |
Glob | tree | List directories |
Bash | exec, test, git-status, git-diff, git-log, git-changed, git-branch | Execution and git operations |
Write | write | Create/overwrite files |
Edit | patch | Precise edits (search/replace) |
MultiEdit | multipatch | Transactional multi-file edit with all-or-nothing rollback |
Protection Rules
- No tools = read-only: Agents without a
toolsfield automatically receiveread,search,treeand are marked as read-only. - Duplicates ignored: If two agents have the same name, only the first one is registered.
Model Router β Smart Model Routing
When an agent declaresmodel:, the dispatcher uses llm/client.ResolveModelRouting to pick the correct client. This is the same function used by skills β guaranteeing consistent behavior in both flows.
Resolution Pipeline
The resolver tries the following signals, in order:1. Active provider's API cache
/models endpoint), use the userβs provider and only swap the model. This covers real models the static catalog doesnβt know about yet. Note: api-cached.2. Catalog on the user's provider
catalog.Resolve(userProvider, hint) matches (exact, alias, or prefix), swap the model on the same provider. Note: catalog-same-provider.3. Catalog across all known providers
GetAvailableProviders() (has an API key configured), cross-provider swap. Note: catalog-cross-provider.4. Family heuristic
claude-*/sonnet/opus/haiku β CLAUDEAI, gpt-*/chatgpt-*/o1/o3/o4 β OPENAI, gemini-* β GOOGLEAI, grok-* β XAI, glm-* β ZAI, minimax* β MINIMAX, kimi-*/moonshot-* β MOONSHOT, llama*/mistral*/qwen*/deepseek* β OLLAMA. Covers future models not yet in the catalog. Note: family-same-provider or family-cross-provider.5. Optimistic
optimistic-user-provider.Guarantees
cli.Client,cli.Provider,cli.Modelare never mutated. Swaps are worker-turn scoped.- OAuth is implicitly validated: a provider only enters
GetAvailableProviders()ifauth.ResolveAuthreturned some credential (API key, OAuth token, or GitHub token). OAuth-only users are treated identically to API-key users. - Cross-provider without API key doesnβt break: graceful fallback with a visible user message.
- Structured logs: each resolver decision emits a log with
note,from_provider,to_provider,from_model,to_model.
Effort Mapping to Providers
Theeffort: hint is propagated via context.WithValue and read by providers inside SendPrompt. Each provider does its own conversion:
| Provider | Effort β Request Field | Supported Models |
|---|---|---|
| Anthropic (Claude) | thinking.budget_tokens | opus-4.x, sonnet-4.x, 3.7-sonnet |
| OpenAI Chat Completions | reasoning_effort | o1, o3, o4, gpt-5, *-reasoning |
| OpenAI Responses | reasoning.effort | o1, o3, o4, gpt-5, *-reasoning |
| Effort | Anthropic budget_tokens | OpenAI effort |
|---|---|---|
unset | (not sent) | (not sent) |
low | (not sent) | low |
medium | 4096 | medium |
high | 16384 | high |
max | 32768 | high (OpenAI has no βmaxβ) |
Skills: Scripts vs Descriptive
Each agent has two kinds of skills:- Executable Skills (Accelerator Scripts)
- Descriptive Skills
V2 Skills (Packages)
V2 Skills are directories containing:SKILL.mdβ Main content with frontmatter- Subskills (
.md) β Additional knowledge documents scripts/β Executable scripts automatically registered on the worker
read subskills and exec scripts during its autonomous operation.
Error Recovery Strategy
When anagent_call fails, the orchestrator follows an intelligent recovery protocol:
Diagnosis via tool_call
tool_call to read relevant files and understand the error (it already has the context).tool_call (fast, precise). New work phases = agent_call (parallel, scalable).Configuration
| Variable | Default | Description |
|---|---|---|
CHATCLI_AGENT_PARALLEL_MODE | true | Enable/disable multi-agent mode |
CHATCLI_AGENT_MAX_WORKERS | 4 | Max concurrent goroutines |
CHATCLI_AGENT_WORKER_MAX_TURNS | 10 | Max turns per worker |
CHATCLI_AGENT_WORKER_TIMEOUT | 5m | Per-worker timeout |
CHATCLI_AGENT_<NAME>_MODEL | (varies) | Model override for a specific built-in (e.g., CHATCLI_AGENT_PLANNER_MODEL=claude-opus-4-6) |
CHATCLI_AGENT_<NAME>_EFFORT | (varies) | Effort override for a specific built-in (e.g., CHATCLI_AGENT_FORMATTER_EFFORT=low) |
.env Example
Anti-Race Safety
The system implements multiple layers of race-condition protection:FileLockManager
Isolated History
[]models.Message, no sharing.Independent LLM Clients
Stateless Engine
engine.Engine.Context Tree
context.WithCancel. Effort hints are attached to this ctx.Policy Enforcement
coder_policy.json (allow/deny/ask). Policy βaskβ actions pause the spinner and display a serialized security prompt to the user.Security Governance in Parallel Mode
Parallel workers respect all rules in thecoder_policy.json file (global and local). Actions like write, patch, exec go through the same policy check as sequential mode.
Behavior by Rule Type
| Rule | Worker Behavior |
|---|---|
| allow | Action runs automatically, no interruption |
| deny | Action silently blocked; worker receives [BLOCKED BY POLICY] error |
| ask | Worker pauses, spinner suspends, and a security prompt is shown to the user |
Prompt Serialization
When multiple workers need approval simultaneously, prompts are serialized via mutex β only one prompt is shown at a time. After the userβs response, the next worker in the queue receives its prompt. This avoids:- Visual prompt overlap in the terminal
- Stdin read conflict
- Spinner rendering over the security prompt
Prompt with Agent Context
The security prompt in parallel mode shows contextual information about which agent is requesting the action:Respect for the Userβs Provider/Model
Parallel workers use, by default, the active provider and model at dispatch time. If the user switches providers via/switch, subsequent agent dispatches will use the new provider correctly.
Exception: agents (built-in or custom) that declare model: and/or effort: can use a different client for that specific turn, resolved by the Model Router. cli.Client still points to the userβs choice β the swap is worker-scoped.
Execution Flow (Example)
Dispatcher creates goroutines with resolved clients
effort=low (userβs model), SearchAgent same. Both in parallel, each with its own LLM client and isolated mini ReAct loop (within maxWorkers limit).Orchestrator dispatches PlannerAgent
effort=high (extended thinking) β even if the user is on Sonnet, the Planner thinks more deeply in this phase.Parallelism Maximization
ChatCLIβs prompt system explicitly instructs the AI to maximize parallelism at every level:- tool_call: Independent operations (read 3 files, search + read) should be emitted in a SINGLE response, not across turns.
- agent_call: For 3+ independent tasks, prefer
agent_callrunning in parallel goroutines. - Per-turn anchor: Every ReAct loop turn includes a reminder reinforcing the need for parallelism.
Compatibility
CHATCLI_AGENT_PARALLEL_MODE=false: everything works exactly as before<tool_call>tags keep working even with parallel mode enabled- No existing function signatures were changed (only additions)
- The
cli/agent/workers/package is fully isolated and does not impact existing functionality - Old agents without
model:/effort:keep working without any changes - Older gRPC servers that donβt carry the new
AgentInfofields return zero values β the client treats them as βinheritβ - Operator and CRDs do NOT need changes: agents are loaded by
persona.Loaderinside the pod, via ConfigMap mounts
When to use Multi-Agent vs Subagent Delegation
Multi-Agent (<agent_call>) and Subagent Delegation (delegate_subagent) are not alternatives β they solve different problems and can coexist within the same turn:
| Aspect | <agent_call> (Multi-Agent) | delegate_subagent |
|---|---|---|
| Parallelism | Yes β multiple tags in a single response run in parallel | Sequential (one at a time) |
| Agent selection | Dispatches to a catalogued agent (FileAgent, CoderAgent, ReviewerAgent, customβ¦) | Generic ReAct loop, no specific persona |
| Per-model routing | Yes β each agent can have its own model: and effort: | No β inherits the parentβs LLM client |
| Context window | Isolated per worker | Isolated per subagent |
| Best for | Breaking large tasks into several specialised sub-projects running in parallel | A focused analysis that needs to consume lots of raw-data tokens without returning all of them to the parent |
<agent_call> when you have multiple types of work that are independent (read files + run tests + review diff). Use delegate_subagent when you have one concentrated analysis over a large payload (summarise /metrics, find a needle in the log).
Next steps
Customizable Agents
model:/effort:.