Skip to main content
The Multi-Agent mode transforms /coder and /agent into an orchestration system where the LLM dispatches specialist agents in parallel to solve complex tasks faster, cheaper, and more accurately. Each agent has its own expertise, its own skills, and β€” since the recent update β€” its own preferred model and effort level.

Activation

Multi-agent mode is enabled by default. To disable it:
CHATCLI_AGENT_PARALLEL_MODE=false
When disabled, /coder and /agent work exactly like before β€” zero impact.

Architecture

User Query
    β”‚
    β–Ό
AgentMode (existing ReAct loop)
    β”‚
    β–Ό  (LLM responds with <agent_call> or <tool_call> tags)
Dispatcher (fan-out via semaphore + per-agent Model/Effort routing)
    β”‚
    β”œβ”€β”€ FileAgent         β”œβ”€β”€ CoderAgent       β”œβ”€β”€ ShellAgent
    β”œβ”€β”€ GitAgent          β”œβ”€β”€ SearchAgent      β”œβ”€β”€ PlannerAgent
    β”œβ”€β”€ ReviewerAgent     β”œβ”€β”€ TesterAgent      β”œβ”€β”€ RefactorAgent
    β”œβ”€β”€ DiagnosticsAgent  β”œβ”€β”€ FormatterAgent   β”œβ”€β”€ DepsAgent
    └── CustomAgent(s)    (devops, security-auditor, etc.)
           β”‚
           β”‚  (each worker: Model Router β†’ target client + effort hint on ctx)
           β–Ό
Results Aggregator β†’ Feedback to the orchestrator LLM
The orchestrator LLM receives an agent catalog in its system prompt and learns to route tasks via <agent_call> tags:
<agent_call agent="file" task="Read all .go files in pkg/coder/engine/" />
<agent_call agent="coder" task="Add Close method to Engine struct" />
<agent_call agent="devops" task="Configure CI/CD pipeline with GitHub Actions" />
Multiple agent_call tags in the same response result in parallel execution.

Two Execution Modes

The orchestrator has two execution mechanisms, choosing the best for each context:
ModeSyntaxWhen to Use
agent_call<agent_call agent="..." task="..." />New work phases, parallel tasks, exploratory reading, multi-file refactoring
tool_call<tool_call name="@coder" args="..." />Quick fixes, error diagnosis, pinpoint patches, post-agent validation. IMPORTANT: multiple independent tool_calls should be emitted in a SINGLE response

Decision Guide

SituationMode
Read multiple files + find referencesagent_call (file + search in parallel)
Fix a compile errortool_call (direct patch)
Write new module + testsagent_call (coder + shell)
Check an agent’s resulttool_call (quick read/exec)
Fix after an agent failuretool_call (precise diagnosis)
Resume after fix appliedagent_call (next phase)

Built-in Specialist Agents

The 12 built-in agents implement the WorkerAgent interface and embed BuiltinAgentMeta, which declares the default model and effort and reads env var overrides (CHATCLI_AGENT_<NAME>_MODEL / CHATCLI_AGENT_<NAME>_EFFORT).

Per-Agent Effort Strategy

Each built-in has an effort level calibrated for the type of work it does. This saves tokens on mechanical tasks and guarantees quality on tasks that require deep reasoning.
AgentDefault EffortRationale
FilelowBatch reads, no reasoning needed
CodermediumSafe diffs benefit from some thinking
ShelllowMechanical command execution
GitlowGit operations are deterministic
SearchlowMechanical grep/tree/read
PlannerhighDecomposition is where value lives (pure reasoning)
ReviewerhighFinding subtle bugs requires deep reasoning
TestermediumGenerates boilerplate + some semantics
RefactorhighRename/extract needs a reference model
DiagnosticshighRoot-cause analysis is pure reasoning
FormatterlowTool-driven, mechanical
DepslowTool output interpretation
Default model: all built-ins leave model empty β€” they respect the user’s /switch choice by default. This ensures the user controls costs and isn’t surprised by a built-in silently swapping models.

Environment Variable Override

To force a different model or effort level on a built-in agent without recompiling, set the env vars:
# Force Planner to use Opus at max effort
export CHATCLI_AGENT_PLANNER_MODEL="claude-opus-4-6"
export CHATCLI_AGENT_PLANNER_EFFORT="max"

# Force Formatter to use Haiku to cheapen formatting
export CHATCLI_AGENT_FORMATTER_MODEL="claude-haiku-4-5"

# Run Reviewer on gpt-5 (cross-provider)
export CHATCLI_AGENT_REVIEWER_MODEL="gpt-5"
export CHATCLI_AGENT_REVIEWER_EFFORT="high"
The env var name uses the agent name in uppercase (FILE, CODER, SHELL, GIT, SEARCH, PLANNER, REVIEWER, TESTER, REFACTOR, DIAGNOSTICS, FORMATTER, DEPS).
If the target model’s provider is not configured (e.g., CHATCLI_AGENT_REVIEWER_MODEL=gpt-5 but no OPENAI_API_KEY), the dispatcher gracefully falls back to the user’s active provider/model and logs a clear warning. No turn breaks due to missing API keys.

The 12 Agents and Their Skills

Access: Read-only (read, tree, search) Default effort: lowSkills:
  • batch-read β€” Accelerator script: reads N files in parallel goroutines without calling the LLM
  • find-pattern β€” Search patterns in files
  • analyze-structure β€” Analyze code structure
  • map-deps β€” Map dependencies between modules
Access: Read/Write (write, patch, read, tree) Default effort: mediumSkills:
  • write-file β€” Create new files
  • patch-file β€” Precise modification of existing code
  • create-module β€” Boilerplate generation
  • refactor β€” Safe rename and refactor
Access: Execution (exec, test) Default effort: lowSkills:
  • run-tests β€” Accelerator script: runs go test ./... -json and parses results
  • build-check β€” Accelerator script: runs go build ./... && go vet ./...
  • lint-fix β€” Automatic lint correction
Access: Git ops (git-status, git-diff, git-log, git-changed, git-branch, exec) Default effort: lowSkills:
  • smart-commit β€” Accelerator script: collects status + diff for smart commit
  • review-changes β€” Accelerator script: analyzes changes via changed + diff + log
  • create-branch β€” Branch creation
Access: None (no tools β€” pure LLM reasoning) Default effort: highSkills:
  • analyze-task β€” Complexity and risk analysis
  • create-plan β€” Execution plan creation
  • decompose β€” Decompose complex tasks
Planner is the agent that benefits most from extended thinking β€” it has no tools, so all value comes from decomposition quality.
Access: Read-only (read, search, tree) Default effort: highSkills:
  • review-file β€” Analyzes a file for bugs, code smells, SOLID violations, and security issues
  • diff-review β€” Accelerator script: reviews staged changes via git-diff and git-changed
  • scan-lint β€” Accelerator script: runs go vet and staticcheck and categorizes issues
Access: Read/Write/Execution (read, write, patch, exec, test, search, tree) Default effort: mediumSkills:
  • generate-tests β€” Generates comprehensive tests for functions and packages (LLM-driven)
  • run-coverage β€” Accelerator script: runs go test -coverprofile and parses per-function coverage
  • find-untested β€” Accelerator script: finds exported functions without corresponding tests
  • generate-table-test β€” Generates idiomatic Go table-driven tests
Access: Read/Write (read, write, patch, search, tree) Default effort: highSkills:
  • rename-symbol β€” Accelerator script: renames symbol across all .go files, ignoring strings and comments
  • extract-interface β€” Extracts an interface from a concrete type’s methods
  • move-function β€” Moves a function between packages adjusting imports
  • inline-variable β€” Replaces a variable with its value at all use sites
Access: Read/Execution (read, search, tree, exec) Default effort: highSkills:
  • analyze-error β€” Parses error messages and stack traces mapping to code locations
  • check-deps β€” Accelerator script: runs go mod tidy, go mod verify and checks dependency health
  • bisect-bug β€” Guides investigation to find the commit that introduced a bug
  • profile-bottleneck β€” Runs benchmarks or pprof and analyzes performance hotspots
Access: Write/Execution (read, patch, exec, tree) Default effort: lowSkills:
  • format-code β€” Accelerator script: runs gofmt -w (or goimports -w) on Go files
  • fix-imports β€” Accelerator script: runs goimports to organize imports
  • normalize-style β€” Applies consistent naming and style conventions (LLM-driven)
Access: Read/Execution (read, exec, search, tree) Default effort: lowSkills:
  • audit-deps β€” Accelerator script: runs go mod verify and govulncheck for auditing
  • update-deps β€” Accelerator script: lists outdated deps with available updates (dry-run)
  • why-dep β€” Accelerator script: explains why a dep exists via go mod why and go mod graph
  • find-outdated β€” Finds all deps with newer versions available

Orchestrator-Visible Catalog

The catalog the orchestrator LLM receives in its system prompt (via registry.CatalogString()) now includes each agent’s LLM profile when it declares non-default preferences. This helps the LLM make informed decisions β€” e.g., prefer planner for deep decomposition and formatter for cheap mechanical work. Example of what the orchestrator sees:
## Available Specialized Agents

### planner (PlannerAgent)
Expert in analyzing tasks and creating execution plans...
LLM profile: effort=high
Allowed commands:
Skills: ...

### formatter (FormatterAgent)
Expert in code formatting and style normalization.
LLM profile: effort=low
Allowed commands: read, patch, exec, tree
Skills: ...

### devops-senior (DevOps Senior)
Senior DevOps focused on CI/CD...
LLM profile: effort=high, model=claude-opus-4-6
Allowed commands: read, search, tree, exec, test
Skills: ...
The profile line only appears when there’s a hint β€” if Effort() and Model() return empty strings, no line is added (avoids prompt noise).

Custom Agents as Workers

Persona agents defined in ~/.chatcli/agents/ are automatically loaded as workers in the orchestration system when starting /coder or /agent. The LLM can dispatch them via <agent_call> with the same ReAct loop, parallel reading, and error recovery as built-in agents.

Full Parity with Skills

Custom agents now have the same preference fields as skills:
---
name: "security-auditor"
description: "Security expert focused on OWASP Top 10"
tools: Read, Grep, Glob
skills:
  - owasp-rules
  - compliance
model: "claude-opus-4-6"       # ideal model for heavy work
effort: "high"                 # extended thinking on
category: "security"
version: "1.0.0"
author: "Security Team"
tags: security, owasp, audit
---
# Base Personality

You are a Security Auditor specialist. Analyze code for
OWASP Top 10 vulnerabilities, injection, XSS, and bad practices.
When dispatched, the dispatcher runs through the Model Router and ensures this agent runs on claude-opus-4-6 with effort=high β€” even if the user is on Sonnet. When the worker finishes, the user’s next turn returns to the original model.

How It Works

1

Scan

When starting multi-agent mode, the system scans ~/.chatcli/agents/ (global) and ./.agent/agents/ (project).
2

CustomAgent creation

For each agent found, a CustomAgent is created, implementing the WorkerAgent interface. Model() and Effort() come directly from the frontmatter.
3

Tools mapping

The tools field in the YAML frontmatter defines which commands the agent can use.
4

Skill loading

Associated skills are loaded and included in the worker’s system prompt.
5

Catalog registration

The agent appears in the orchestrator’s catalog (with LLM profile if declared) and can be dispatched.
6

Dispatcher applies hints

On every <agent_call>, before the worker starts, the dispatcher consults ResolveModelRouting (for model) and attaches WithEffortHint to the ctx (for effort).

Tools Mapping

The tools field in YAML frontmatter maps Claude Code-style tools to @coder subcommands:
Tool in YAML@coder Command(s)Description
ReadreadRead file contents
GrepsearchSearch patterns in files
GlobtreeList directories
Bashexec, test, git-status, git-diff, git-log, git-changed, git-branchExecution and git operations
WritewriteCreate/overwrite files
EditpatchPrecise edits (search/replace)
MultiEditmultipatchTransactional multi-file edit with all-or-nothing rollback

Protection Rules

The 12 built-in agent names (file, coder, shell, git, search, planner, reviewer, tester, refactor, diagnostics, formatter, deps) are protected and cannot be overridden by custom agents.
  • No tools = read-only: Agents without a tools field automatically receive read, search, tree and are marked as read-only.
  • Duplicates ignored: If two agents have the same name, only the first one is registered.

Model Router β€” Smart Model Routing

When an agent declares model:, the dispatcher uses llm/client.ResolveModelRouting to pick the correct client. This is the same function used by skills β€” guaranteeing consistent behavior in both flows.

Resolution Pipeline

The resolver tries the following signals, in order:
1

1. Active provider's API cache

If the target model appears in the current provider’s model list (discovered via /models endpoint), use the user’s provider and only swap the model. This covers real models the static catalog doesn’t know about yet. Note: api-cached.
2

2. Catalog on the user's provider

If catalog.Resolve(userProvider, hint) matches (exact, alias, or prefix), swap the model on the same provider. Note: catalog-same-provider.
3

3. Catalog across all known providers

If the model exists in another provider’s catalog and that provider is in GetAvailableProviders() (has an API key configured), cross-provider swap. Note: catalog-cross-provider.
4

4. Family heuristic

claude-*/sonnet/opus/haiku β†’ CLAUDEAI, gpt-*/chatgpt-*/o1/o3/o4 β†’ OPENAI, gemini-* β†’ GOOGLEAI, grok-* β†’ XAI, glm-* β†’ ZAI, minimax* β†’ MINIMAX, kimi-*/moonshot-* β†’ MOONSHOT, llama*/mistral*/qwen*/deepseek* β†’ OLLAMA. Covers future models not yet in the catalog. Note: family-same-provider or family-cross-provider.
5

5. Optimistic

Totally unknown model? Passes to the user’s provider and lets the API decide. If the provider factory accepts, runs; if rejected, falls through. Note: optimistic-user-provider.
6

6. Graceful fallback

If the target provider is unavailable (no API key) or the factory failed, use the user’s client and populate UserMessage with readable text. Note: fallback-unavailable or fallback-build-failed.

Guarantees

  • cli.Client, cli.Provider, cli.Model are never mutated. Swaps are worker-turn scoped.
  • OAuth is implicitly validated: a provider only enters GetAvailableProviders() if auth.ResolveAuth returned some credential (API key, OAuth token, or GitHub token). OAuth-only users are treated identically to API-key users.
  • Cross-provider without API key doesn’t break: graceful fallback with a visible user message.
  • Structured logs: each resolver decision emits a log with note, from_provider, to_provider, from_model, to_model.

Effort Mapping to Providers

The effort: hint is propagated via context.WithValue and read by providers inside SendPrompt. Each provider does its own conversion:
ProviderEffort β†’ Request FieldSupported Models
Anthropic (Claude)thinking.budget_tokensopus-4.x, sonnet-4.x, 3.7-sonnet
OpenAI Chat Completionsreasoning_efforto1, o3, o4, gpt-5, *-reasoning
OpenAI Responsesreasoning.efforto1, o3, o4, gpt-5, *-reasoning
Unsupported models get the request without the field (silently ignored). Mapping table:
EffortAnthropic budget_tokensOpenAI effort
unset(not sent)(not sent)
low(not sent)low
medium4096medium
high16384high
max32768high (OpenAI has no β€œmax”)

Skills: Scripts vs Descriptive

Each agent has two kinds of skills:
Pre-defined command sequences that bypass the LLM for mechanical, repetitive operations, executing directly on the engine:
batch-read   β†’ Reads N files in parallel goroutines (no LLM call)
run-tests    β†’ go test ./... -json | automatic parse
build-check  β†’ go build ./... && go vet ./...
smart-commit β†’ git status + git diff --cached β†’ summary
map-project  β†’ tree + search interfaces/structs in parallel

V2 Skills (Packages)

V2 Skills are directories containing:
  • SKILL.md β€” Main content with frontmatter
  • Subskills (.md) β€” Additional knowledge documents
  • scripts/ β€” Executable scripts automatically registered on the worker
skills/
└── clean-code/
    β”œβ”€β”€ SKILL.md            # Main content
    β”œβ”€β”€ naming-rules.md     # Subskill: naming rules
    β”œβ”€β”€ formatting.md       # Subskill: formatting rules
    └── scripts/
        └── lint_check.py   # Executable script (registered as skill)
The worker can read subskills and exec scripts during its autonomous operation.

Error Recovery Strategy

When an agent_call fails, the orchestrator follows an intelligent recovery protocol:
1

Diagnosis via tool_call

Uses direct tool_call to read relevant files and understand the error (it already has the context).
2

Fix via tool_call

Patches, file corrections, and retries are faster and safer via tool_call.
3

Resume via agent_call

After the fix is applied and verified, resume via agent_call for the next phase.
Key rule: Error recovery = tool_call (fast, precise). New work phases = agent_call (parallel, scalable).
agent_call β†’ FAIL
    β”‚
    β–Ό
tool_call: read (diagnose error)
    β”‚
    β–Ό
tool_call: patch (apply fix)
    β”‚
    β–Ό
tool_call: exec (verify fix)
    β”‚
    β–Ό
agent_call β†’ NEXT PHASE (success)

Configuration

VariableDefaultDescription
CHATCLI_AGENT_PARALLEL_MODEtrueEnable/disable multi-agent mode
CHATCLI_AGENT_MAX_WORKERS4Max concurrent goroutines
CHATCLI_AGENT_WORKER_MAX_TURNS10Max turns per worker
CHATCLI_AGENT_WORKER_TIMEOUT5mPer-worker timeout
CHATCLI_AGENT_<NAME>_MODEL(varies)Model override for a specific built-in (e.g., CHATCLI_AGENT_PLANNER_MODEL=claude-opus-4-6)
CHATCLI_AGENT_<NAME>_EFFORT(varies)Effort override for a specific built-in (e.g., CHATCLI_AGENT_FORMATTER_EFFORT=low)

.env Example

# Multi-Agent (Parallel Orchestration)
CHATCLI_AGENT_PARALLEL_MODE=true    # Set to false to disable
CHATCLI_AGENT_MAX_WORKERS=4
CHATCLI_AGENT_WORKER_MAX_TURNS=10
CHATCLI_AGENT_WORKER_TIMEOUT=5m

# Built-in overrides (optional)
CHATCLI_AGENT_PLANNER_MODEL=claude-opus-4-6
CHATCLI_AGENT_PLANNER_EFFORT=max
CHATCLI_AGENT_FORMATTER_MODEL=claude-haiku-4-5
CHATCLI_AGENT_REVIEWER_EFFORT=high

Anti-Race Safety

The system implements multiple layers of race-condition protection:

FileLockManager

Per-filepath mutex (normalized absolute paths). Write operations acquire the lock; reads do not block.

Isolated History

Each worker keeps its own []models.Message, no sharing.

Independent LLM Clients

Each worker creates its own LLM client instance via factory pattern. With the Model Router, each worker can have a client from a different provider.

Stateless Engine

Each worker instantiates a fresh engine.Engine.

Context Tree

The parent context can cancel all workers via context.WithCancel. Effort hints are attached to this ctx.

Policy Enforcement

Workers fully respect coder_policy.json (allow/deny/ask). Policy β€œask” actions pause the spinner and display a serialized security prompt to the user.

Security Governance in Parallel Mode

Parallel workers respect all rules in the coder_policy.json file (global and local). Actions like write, patch, exec go through the same policy check as sequential mode.

Behavior by Rule Type

RuleWorker Behavior
allowAction runs automatically, no interruption
denyAction silently blocked; worker receives [BLOCKED BY POLICY] error
askWorker pauses, spinner suspends, and a security prompt is shown to the user

Prompt Serialization

When multiple workers need approval simultaneously, prompts are serialized via mutex β€” only one prompt is shown at a time. After the user’s response, the next worker in the queue receives its prompt. This avoids:
  • Visual prompt overlap in the terminal
  • Stdin read conflict
  • Spinner rendering over the security prompt

Prompt with Agent Context

The security prompt in parallel mode shows contextual information about which agent is requesting the action:
╔════════════════════════════════════════════════════════╗
β•‘                  SECURITY CHECK                         β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•
 Agent:  coder
 Task:   Refactor auth module
 ────────────────────────────────────────────────────────
 Action: Write file
         file: pkg/auth/handler.go
 Rule:   no rule for '@coder write'
 ────────────────────────────────────────────────────────
 Choose:
   [y] Yes, execute (once)
   [a] Always allow (@coder write)
   [n] No, skip
   [d] Always deny (@coder write)
This allows the user to make informed decisions about each action, knowing exactly which agent is asking and why.

Respect for the User’s Provider/Model

Parallel workers use, by default, the active provider and model at dispatch time. If the user switches providers via /switch, subsequent agent dispatches will use the new provider correctly. Exception: agents (built-in or custom) that declare model: and/or effort: can use a different client for that specific turn, resolved by the Model Router. cli.Client still points to the user’s choice β€” the swap is worker-scoped.

Execution Flow (Example)

1

User sends query

β€œrefactor the coder module, split read and write”
2

Orchestrator LLM dispatches parallel agents

<agent_call agent="file" task="Read all .go files in pkg/coder/engine/" />
<agent_call agent="search" task="Find references to handleRead and handleWrite" />
3

Dispatcher creates goroutines with resolved clients

FileAgent runs with effort=low (user’s model), SearchAgent same. Both in parallel, each with its own LLM client and isolated mini ReAct loop (within maxWorkers limit).
4

Results aggregated

Feedback is sent to the orchestrator.
5

Orchestrator dispatches PlannerAgent

To decompose the refactor. Planner runs with effort=high (extended thinking) β€” even if the user is on Sonnet, the Planner thinks more deeply in this phase.
6

Dispatches CoderAgent

For the refactor (with effort=medium and FileLock on files being written).
7

Dispatches ShellAgent for tests

Runs tests after writing (effort=low, mechanical).
8

Error recovery (if needed)

If tests fail, uses tool_call for diagnosis and quick fix.
9

Final validation

Orchestrator validates the final result and reports back to the user.

Parallelism Maximization

ChatCLI’s prompt system explicitly instructs the AI to maximize parallelism at every level:
  1. tool_call: Independent operations (read 3 files, search + read) should be emitted in a SINGLE response, not across turns.
  2. agent_call: For 3+ independent tasks, prefer agent_call running in parallel goroutines.
  3. Per-turn anchor: Every ReAct loop turn includes a reminder reinforcing the need for parallelism.
Correct example (3 reads in ONE response):
<tool_call name="@coder" args='{"cmd":"read","args":{"file":"main.go"}}' />
<tool_call name="@coder" args='{"cmd":"read","args":{"file":"config.go"}}' />
<tool_call name="@coder" args='{"cmd":"read","args":{"file":"handler.go"}}' />
Incorrect example (3 turns for independent ops):
Turn 1: read main.go β†’ wait
Turn 2: read config.go β†’ wait
Turn 3: read handler.go β†’ wait

Compatibility

  • CHATCLI_AGENT_PARALLEL_MODE=false: everything works exactly as before
  • <tool_call> tags keep working even with parallel mode enabled
  • No existing function signatures were changed (only additions)
  • The cli/agent/workers/ package is fully isolated and does not impact existing functionality
  • Old agents without model:/effort: keep working without any changes
  • Older gRPC servers that don’t carry the new AgentInfo fields return zero values β€” the client treats them as β€œinherit”
  • Operator and CRDs do NOT need changes: agents are loaded by persona.Loader inside the pod, via ConfigMap mounts

When to use Multi-Agent vs Subagent Delegation

Multi-Agent (<agent_call>) and Subagent Delegation (delegate_subagent) are not alternatives β€” they solve different problems and can coexist within the same turn:
Aspect<agent_call> (Multi-Agent)delegate_subagent
ParallelismYes β€” multiple tags in a single response run in parallelSequential (one at a time)
Agent selectionDispatches to a catalogued agent (FileAgent, CoderAgent, ReviewerAgent, custom…)Generic ReAct loop, no specific persona
Per-model routingYes β€” each agent can have its own model: and effort:No β€” inherits the parent’s LLM client
Context windowIsolated per workerIsolated per subagent
Best forBreaking large tasks into several specialised sub-projects running in parallelA focused analysis that needs to consume lots of raw-data tokens without returning all of them to the parent
Use <agent_call> when you have multiple types of work that are independent (read files + run tests + review diff). Use delegate_subagent when you have one concentrated analysis over a large payload (summarise /metrics, find a needle in the log).

Next steps

Customizable Agents

Create your own personas with per-agent model:/effort:.

Subagent Delegation

Focused delegation for concentrated analysis over a large payload.

Agent Progress UI

Live display of each worker during parallel execution.

Agentic plugins

Full catalogue of plugins available to the agents.