Skip to main content
ChatCLI ships seven LLM-agent patterns that work together as a harness/quality pipeline wrapping the existing ReAct dispatcher. Each pattern has a distinct purpose, can be toggled per session or via /config, and composes with the others without regressing steady-state performance.
Design premise: opt-in by default. With CHATCLI_QUALITY_* unset, the pipeline runs with zero post-hooks β€” Pipeline.Run degenerates to a direct agent.Execute call. You only pay for what you enable.

The seven patterns

#1 β€” ReAct

Reason β†’ Act β†’ Observe. The base loop every worker runs. Already present; now emits structured events and auto-attaches effort hints.

#2 β€” Plan-and-Solve / ReWOO

PlannerAgent emits structured JSON; PlanRunner executes steps in topological order with #E1.head=200 placeholders.

#3 β€” Reflexion

Detects error, hallucination or low quality; distills a Lesson via LLM and persists in memory.Fact for future RAG retrieval.

#4 β€” RAG + HyDE

Hypothesis-based keyword expansion (3a) + cosine vector search (3b β€” Voyage/OpenAI, pure-Go backend).

#5 β€” Self-Refine

RefinerAgent critiques the draft and rewrites. Multi-pass with convergence via EpsilonChars.

#6 β€” Chain-of-Verification

VerifierAgent generates independent verification questions, answers each, and rewrites on discrepancy.

#7 β€” Reasoning Backbone

Cross-provider abstraction: thinking_budget on Anthropic, reasoning_effort on OpenAI. Auto-attach for critical agents.

Configuration

CHATCLI_QUALITY_* env vars, /config quality, and the five slash commands: /thinking, /plan, /refine, /verify, /reflect.

How the patterns connect

                 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                 β”‚   /agent or /coder <task>            β”‚
                 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚
  (#4 RAG+HyDE) ────────────────────▼───────────────
  memory.Retriever expands hints with an LLM-generated
  hypothesis (HyDE-3a) and optionally searches the
  vector store (HyDE-3b) before assembling the system
  prompt.
                                    β”‚
  (#2 Plan-and-Solve) ──────────────▼───────────────
  When triggered (auto-score or /plan), the planner
  emits a JSON plan, PlanRunner executes each step
  resolving #E1 placeholders, and a deterministic
  report is injected into history.
                                    β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚   ReAct loop (workers)        β”‚
                    β”‚   (#1, always on)             β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚   QualityPipeline (per call)  β”‚
                    β”‚   - Pre:  applyAutoReasoning  β”‚ (#7)
                    β”‚   - Execute worker            β”‚
                    β”‚   - Post: RefineHook          β”‚ (#5)
                    β”‚   - Post: VerifyHook          β”‚ (#6)
                    β”‚   - Post: ReflexionHook       β”‚ (#3)
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚
  Lessons from Reflexion are persisted to memory.Fact
  and surface again via #4 on similar future tasks β€”
  closing the loop without retraining.
Architectural principle: every new pattern hooks into the dispatcher or the context builder β€” the inner ReAct loop (worker_react.go) does not change. Patterns don’t replace each other, they compose.

Pipeline Architecture (engine)

The QualityPipeline itself is a thread-safe machine with enterprise guarantees. Hooks are pluggable, but the scheduler below handles concurrency, failures, and shutdown:
1

State machine (Active β†’ Draining β†’ Closed)

Transitions via atomic CAS. DrainAndClose(timeout) waits for in-flight to finish before closing β€” safe for graceful SIGTERM.
2

Copy-on-Write snapshots

Each AddPre/AddPost/SwapConfig builds a new snapshot and CAS-swaps via atomic.Pointer. In-flight runs always see a consistent view; zero locks on the hot path.
3

Per-hook isolation

Every hook runs inside a wrapper that recovers panics, enforces a timeout (default 30s), and records failures in a per-hook circuit breaker (default 5 failures β†’ open 30s).
4

Priority-based ordering

Hooks implementing Prioritized are ordered (lower first). Ties preserve registration order. Backward compat: hooks without Priority() default to 100.
5

Short-circuit sentinels

A PreHook can return ErrSkipExecution (cache hit β†’ skip agent.Execute) or ErrSkipRemainingHooks (stop the phase). The pipeline synthesizes a result so PostHooks still run.
6

Hot reload

SwapConfig(cfg) atomically swaps config. In-flight runs keep the old config (correct β€” one turn under one config); new runs pick up the new one.

Pipeline metrics

5 collections under chatcli_quality_pipeline_*:
MetricTypeLabelsNotes
dispatch_totalCounteroutcomeok, exec_error, pre_short_circuit, bypass_disabled, draining, closed
hook_duration_secondsHistogramhook, phasephase = pre|post
hook_errors_totalCounterhook, reasonreturned_error, timeout, panic, circuit_open
hook_circuit_stateGaugehook0=closed, 1=open, 2=half_open
generationGaugeβ€”Snapshot version β€” bumps on every registration/swap
Use generation to correlate dashboards with config changes:
# Pipeline change timeline
changes(chatcli_quality_pipeline_generation[1h])

# Worst-offender hooks
sort_desc(rate(chatcli_quality_pipeline_hook_errors_total[5m])) by (hook, reason)

Trigger matrix

PatternSlashEnv varDefaultAuto trigger
#1 ReActβ€”β€”always onalways
#2 Plan-First/plan [agent|coder|preview|dry] [task]CHATCLI_QUALITY_PLAN_FIRST_MODEautocomplexity β‰₯ 6
#3 Reflexion/reflect <lesson>CHATCLI_QUALITY_REFLEXION_ENABLEDonerror, CoVe flagged, refine low
#4 HyDEβ€” (transparent)CHATCLI_QUALITY_HYDE_ENABLEDoffevery retrieval
#5 Refine/refine on|offCHATCLI_QUALITY_REFINE_ENABLEDoffpost-worker
#6 CoVe/verify on|offCHATCLI_QUALITY_VERIFY_ENABLEDoffpost-worker
#7 Reasoning/thinking on|offCHATCLI_QUALITY_REASONING_MODEautofor AutoAgents

Override priority

For a given turn, the effort hint resolves in this order (later wins):
1

Skill frontmatter

effort: high in the activated skill’s frontmatter.
2

Agent default

E.g. PlannerAgent has embedded effort="high".
3

CHATCLI_QUALITY_REASONING_*

Auto-enable for agents in AutoAgents.
4

/thinking session override

Wins over everything above for the next turn.
For Refine / Verify / Reflexion hook enablement:
1

/config quality (env)

CHATCLI_QUALITY_REFINE_ENABLED, etc.
2

/refine and /verify session toggles

*bool override living on cli.qualityOverrides; overrides env for the session.
For Plan-First:
1

/plan one-shot flag

cli.pendingPlanFirst = true consumed on next dispatch.
2

CHATCLI_QUALITY_PLAN_FIRST_MODE + complexity

always ignores score; auto fires when ComplexityScore(task) >= threshold.

Cost and latency

Defaults are calibrated for steady-state identical to pre-pipeline chatcli. Expensive patterns (Refine, Verify, HyDE) start off; you opt in when the context justifies.
PatternExtra LLM calls per turnNotes
ReAct0 (already part of the loop)β€”
Plan-First (auto)+1 (planner) when triggeredSteps reuse the dispatcher
Reflexion+1 (lesson gen), backgroundNever blocks the turn
HyDE 3a+1 (hypothesis), cheap200 token budget
HyDE 3b+1 (query embed) + lazy backfillembedding API ~$0.00002/1k tokens
Self-Refine+N (one per pass, default 1)Convergence cuts it short
CoVe+1 (verifier) per call siteInternally N=3 questions
Reasoning auto0 extra calls; +tokens on hosted thinkingAnthropic budget = 8k default

Observability

Every active pattern shows up in /config quality:
✨ AGENT HARNESS/QUALITY PIPELINE ────────────────────────
  CHATCLI_QUALITY_ENABLED         : enabled
  Hooks registered                : pre=0, post=3

  ── Self-Refine (#5)
  CHATCLI_QUALITY_REFINE_ENABLED  : enabled
  CHATCLI_QUALITY_REFINE_MAX_PASSES: 1
  ...

  ── RAG + HyDE (#4)
  CHATCLI_QUALITY_HYDE_ENABLED    : enabled
  CHATCLI_QUALITY_HYDE_USE_VECTORS: enabled
  CHATCLI_EMBED_PROVIDER          : bedrock
  CHATCLI_EMBED_MODEL             : amazon.titan-embed-text-v2:0
  Vector provider                : bedrock:amazon.titan-embed-text-v2:0
  Vector entries                 : 127
CHATCLI_EMBED_PROVIDER accepts voyage, openai, bedrock, or null. For Bedrock, ChatCLI reuses the same AWS chain as the chat (IAM, profile, SSO) β€” no extra API key. See RAG + HyDE for per-provider details.

Next steps

Tutorial: Plan-and-Solve

Start with the pattern with highest leverage on multi-step tasks.

Configure HyDE with vectors

Enable embeddings (Voyage, OpenAI, or Bedrock Titan/Cohere) for semantic retrieval.

Slash reference

/thinking, /plan, /refine, /verify, /reflect.

Full env var list

All CHATCLI_QUALITY_* and CHATCLI_EMBED_*.