/config, and composes with the others without regressing steady-state performance.
Design premise: opt-in by default. With
CHATCLI_QUALITY_* unset, the pipeline runs with zero post-hooks β Pipeline.Run degenerates to a direct agent.Execute call. You only pay for what you enable.The seven patterns
#1 β ReAct
Reason β Act β Observe. The base loop every worker runs. Already present; now emits structured events and auto-attaches effort hints.
#2 β Plan-and-Solve / ReWOO
PlannerAgent emits structured JSON; PlanRunner executes steps in topological order with #E1.head=200 placeholders.#3 β Reflexion
Detects error, hallucination or low quality; distills a Lesson via LLM and persists in
memory.Fact for future RAG retrieval.#4 β RAG + HyDE
Hypothesis-based keyword expansion (3a) + cosine vector search (3b β Voyage/OpenAI, pure-Go backend).
#5 β Self-Refine
RefinerAgent critiques the draft and rewrites. Multi-pass with convergence via EpsilonChars.#6 β Chain-of-Verification
VerifierAgent generates independent verification questions, answers each, and rewrites on discrepancy.#7 β Reasoning Backbone
Cross-provider abstraction:
thinking_budget on Anthropic, reasoning_effort on OpenAI. Auto-attach for critical agents.Configuration
CHATCLI_QUALITY_* env vars, /config quality, and the five slash commands: /thinking, /plan, /refine, /verify, /reflect.How the patterns connect
Pipeline Architecture (engine)
The QualityPipeline itself is a thread-safe machine with enterprise guarantees. Hooks are pluggable, but the scheduler below handles concurrency, failures, and shutdown:State machine (Active β Draining β Closed)
Transitions via atomic CAS.
DrainAndClose(timeout) waits for in-flight to finish before closing β safe for graceful SIGTERM.Copy-on-Write snapshots
Each
AddPre/AddPost/SwapConfig builds a new snapshot and CAS-swaps via atomic.Pointer. In-flight runs always see a consistent view; zero locks on the hot path.Per-hook isolation
Every hook runs inside a wrapper that recovers panics, enforces a timeout (default 30s), and records failures in a per-hook circuit breaker (default 5 failures β open 30s).
Priority-based ordering
Hooks implementing
Prioritized are ordered (lower first). Ties preserve registration order. Backward compat: hooks without Priority() default to 100.Short-circuit sentinels
A PreHook can return
ErrSkipExecution (cache hit β skip agent.Execute) or ErrSkipRemainingHooks (stop the phase). The pipeline synthesizes a result so PostHooks still run.Pipeline metrics
5 collections underchatcli_quality_pipeline_*:
| Metric | Type | Labels | Notes |
|---|---|---|---|
dispatch_total | Counter | outcome | ok, exec_error, pre_short_circuit, bypass_disabled, draining, closed |
hook_duration_seconds | Histogram | hook, phase | phase = pre|post |
hook_errors_total | Counter | hook, reason | returned_error, timeout, panic, circuit_open |
hook_circuit_state | Gauge | hook | 0=closed, 1=open, 2=half_open |
generation | Gauge | β | Snapshot version β bumps on every registration/swap |
generation to correlate dashboards with config changes:
Trigger matrix
| Pattern | Slash | Env var | Default | Auto trigger |
|---|---|---|---|---|
| #1 ReAct | β | β | always on | always |
| #2 Plan-First | /plan [agent|coder|preview|dry] [task] | CHATCLI_QUALITY_PLAN_FIRST_MODE | auto | complexity β₯ 6 |
| #3 Reflexion | /reflect <lesson> | CHATCLI_QUALITY_REFLEXION_ENABLED | on | error, CoVe flagged, refine low |
| #4 HyDE | β (transparent) | CHATCLI_QUALITY_HYDE_ENABLED | off | every retrieval |
| #5 Refine | /refine on|off | CHATCLI_QUALITY_REFINE_ENABLED | off | post-worker |
| #6 CoVe | /verify on|off | CHATCLI_QUALITY_VERIFY_ENABLED | off | post-worker |
| #7 Reasoning | /thinking on|off | CHATCLI_QUALITY_REASONING_MODE | auto | for AutoAgents |
Override priority
For a given turn, the effort hint resolves in this order (later wins):
For Refine / Verify / Reflexion hook enablement:
For Plan-First:
Cost and latency
Defaults are calibrated for steady-state identical to pre-pipeline chatcli. Expensive patterns (Refine, Verify, HyDE) start off; you opt in when the context justifies.
| Pattern | Extra LLM calls per turn | Notes |
|---|---|---|
| ReAct | 0 (already part of the loop) | β |
| Plan-First (auto) | +1 (planner) when triggered | Steps reuse the dispatcher |
| Reflexion | +1 (lesson gen), background | Never blocks the turn |
| HyDE 3a | +1 (hypothesis), cheap | 200 token budget |
| HyDE 3b | +1 (query embed) + lazy backfill | embedding API ~$0.00002/1k tokens |
| Self-Refine | +N (one per pass, default 1) | Convergence cuts it short |
| CoVe | +1 (verifier) per call site | Internally N=3 questions |
| Reasoning auto | 0 extra calls; +tokens on hosted thinking | Anthropic budget = 8k default |
Observability
Every active pattern shows up in/config quality:
Next steps
Tutorial: Plan-and-Solve
Start with the pattern with highest leverage on multi-step tasks.
Configure HyDE with vectors
Enable embeddings (Voyage, OpenAI, or Bedrock Titan/Cohere) for semantic retrieval.
Slash reference
/thinking, /plan, /refine, /verify, /reflect.Full env var list
All
CHATCLI_QUALITY_* and CHATCLI_EMBED_*.