Mixture-of-Agents (MoA)

Mixture-of-Agents (MoA) runs the same prompt across several models in parallel (the proposers) and then uses an aggregator to synthesize a single high-quality answer from all proposals. The technique is based on Wang et al., arXiv:2406.04692.

MoA is distinct from Multi-Agent Orchestration: there the axis is specialist agents (planner, reviewer…) dispatched in parallel within a task. Here the axis is model/provider diversity answering the same question. Triggered by the /moa command.

How it works

/moa <prompt>
     │
     ├─ proposer 1 (openai:gpt-5)        ┐
     ├─ proposer 2 (claudeai:opus-4-8)   ├─ in parallel, same prompt
     ├─ proposer 3 (googleai:gemini-2.5) ┘
     │
     ▼
  aggregator (active provider/model, or CHATCLI_MOA_AGGREGATOR)
     │
     ▼
  single synthesized answer

Shared briefing — the same structured system prompt of a chat turn (contexts attached via /context attach, workspace memory, skills, per-turn retrieval) is assembled once per round and handed to every participant.
Parallel proposal — each proposer receives the same prompt and the same briefing, concurrently, and may pull knowledge, CCR and memory (read-only) before answering.
Fault tolerance — a proposer’s error is tolerated; MoA proceeds as long as at least one answered. It only fails if the aggregator itself fails.
Aggregation — the aggregator gets a synthesis prompt with all proposals, plus the same briefing and the same tools, and produces one correct, cohesive answer (without mentioning that aggregation happened).

A 5-minute timeout wraps the whole round. At the end, ChatCLI shows which proposers contributed (✓/✗) and renders the final answer.

Panel briefing and tools

Every panel participant — proposers and aggregator — is as capable as a regular conversation turn: it gets the session’s full briefing plus three strictly read-only tool exceptions, executed in a bounded loop (up to 4 tool rounds per participant), over both native tool-use and XML for providers without native support:

Tool	What it does	When it is offered
`knowledge`	Queries the knowledge bases attached to the session (search/get/toc)	Same gate as the chat exception: enabled + a base attached
`recall`	Expands `<<ccr:KEY>>` markers from context compression back into the original	Only when the compression layer is wired and the history actually carries a marker
`memory`	Recalls the user’s long-term memory (profile, durable facts, preferences, notes)	`CHATCLI_MEMORY_MODE` other than `off`

When any tool is active, the round’s status line reports it: Expert tools enabled for this panel: knowledge, recall, memory.

Read-only by construction. Memory access is pinned to the recall subcommand by the executor itself — the mutating forms (remember, profile, forget) are unreachable from a panel turn, even if a model smuggles one into the call arguments. ask_user and graphview are excluded on purpose: participants run in parallel and unattended. No exec, file or search tools, ever — the same rule chat mode enforces.

Configuration

Variable	Description
`CHATCLI_MOA_MODELS`	Optional. Proposers as CSV `provider:model`. Without it, `/moa` and `@moa` use the configured providers (sorted, capped at 4).
`CHATCLI_MOA_AGGREGATOR`	Optional. Aggregator model (`provider:model`). Default: the session’s active provider/model.

export CHATCLI_MOA_MODELS="openai:gpt-5,claudeai:claude-opus-4-8,googleai:gemini-2.5-pro"
export CHATCLI_MOA_AGGREGATOR="claudeai:claude-opus-4-8"   # optional

Auth: works with OAuth, not just API keys — when a proposer/aggregator matches the session’s active provider, MoA reuses the session client, honoring the OAuth token (preferred over an API key when logged in) or forwarded tokens (server/gateway mode). Provider names are resolved case-insensitively (openai → OPENAI). The /moa command and the @moa tool share the same resolution and defaults.

Use different providers as proposers to maximize diversity — that’s where MoA’s quality gain comes from. The aggregator usually performs best as a strong model (Opus, GPT-5).

Usage

> /moa explain the trade-offs between WAL and snapshotting in a durable scheduler
  Running Mixture-of-Agents with 3 models…
    ✓ openai:gpt-5
    ✓ claudeai:claude-opus-4-8
    ✓ googleai:gemini-2.5-pro

  <single synthesized answer, rendered as Markdown>

Any configured provider can participate as a proposer or aggregator. See Supported Models.

`@moa` tool (for the agent)

Besides the /moa command, there’s the @moa tool, which the agent can invoke inside an agent/coder flow to query several models and synthesize the best answer:

<tool_call name="@moa" args='{"cmd":"ask","args":{"prompt":"Design a rate limiter for 1M rps"}}' />
<tool_call name="@moa" args='{"cmd":"list"}' />

ask {prompt, models?, aggregator?} — models optional (e.g. ["openai","anthropic:claude-opus-4-8"]); defaults to a set of configured providers. aggregator defaults to the session model.
list — providers available to participate.

@moa members receive the agent flow’s conversation history and the same read-only panel tools (knowledge, recall, memory) — the chat briefing is not reassembled, since the agent’s context is already in the history. It degrades gracefully: a single successful proposer is returned directly; if the aggregator is unavailable, the best candidate is returned instead of failing.

Home

Getting Started

Core Concepts

Features

Security

Support

Mixture-of-Agents (MoA)

How it works

Panel briefing and tools

Configuration

Usage

`@moa` tool (for the agent)

See also

​How it works

​Panel briefing and tools

​Configuration

​Usage

​@moa tool (for the agent)

​See also

How it works

Panel briefing and tools

Configuration

Usage

`@moa` tool (for the agent)

See also