Agent Harness/Quality Pipeline

ChatCLI ships seven LLM-agent patterns that work together as a harness/quality pipeline wrapping the existing ReAct dispatcher. Each pattern has a distinct purpose, can be toggled per session or via /config, and composes with the others without regressing steady-state performance.

Design premise: opt-in by default. With CHATCLI_QUALITY_* unset, the pipeline runs with zero post-hooks — Pipeline.Run degenerates to a direct agent.Execute call. You only pay for what you enable.

The seven patterns

#1 — ReAct

Reason → Act → Observe. The base loop every worker runs. Already present; now emits structured events and auto-attaches effort hints.

#2 — Plan-and-Solve / ReWOO

PlannerAgent emits structured JSON; PlanRunner executes steps in topological order with #E1.head=200 placeholders.

#3 — Reflexion

Detects error, hallucination or low quality; distills a Lesson via LLM and persists in memory.Fact for future RAG retrieval.

#4 — RAG + HyDE

Hypothesis-based keyword expansion (3a) + cosine vector search (3b — Voyage/OpenAI, pure-Go backend).

#5 — Self-Refine

RefinerAgent critiques the draft and rewrites. Multi-pass with convergence via EpsilonChars.

#6 — Chain-of-Verification

VerifierAgent generates independent verification questions, answers each, and rewrites on discrepancy.

#7 — Reasoning Backbone

Cross-provider abstraction: thinking_budget on Anthropic, reasoning_effort on OpenAI. Auto-attach for critical agents.

Configuration

CHATCLI_QUALITY_* env vars, /config quality, and the five slash commands: /thinking, /plan, /refine, /verify, /reflect.

How the patterns connect

                 ┌──────────────────────────────────────┐
                 │   /agent or /coder <task>            │
                 └──────────────────┬───────────────────┘
                                    │
  (#4 RAG+HyDE) ────────────────────▼───────────────
  memory.Retriever expands hints with an LLM-generated
  hypothesis (HyDE-3a) and optionally searches the
  vector store (HyDE-3b) before assembling the system
  prompt.
                                    │
  (#2 Plan-and-Solve) ──────────────▼───────────────
  When triggered (auto-score or /plan), the planner
  emits a JSON plan, PlanRunner executes each step
  resolving #E1 placeholders, and a deterministic
  report is injected into history.
                                    │
                    ┌───────────────▼───────────────┐
                    │   ReAct loop (workers)        │
                    │   (#1, always on)             │
                    └───────────────┬───────────────┘
                                    │
                    ┌───────────────▼───────────────┐
                    │   QualityPipeline (per call)  │
                    │   - Pre:  applyAutoReasoning  │ (#7)
                    │   - Execute worker            │
                    │   - Post: RefineHook          │ (#5)
                    │   - Post: VerifyHook          │ (#6)
                    │   - Post: ReflexionHook       │ (#3)
                    └───────────────┬───────────────┘
                                    │
  Lessons from Reflexion are persisted to memory.Fact
  and surface again via #4 on similar future tasks —
  closing the loop without retraining.

Architectural principle: every new pattern hooks into the dispatcher or the context builder — the inner ReAct loop (worker_react.go) does not change. Patterns don’t replace each other, they compose.

Pipeline Architecture (engine)

The QualityPipeline itself is a thread-safe machine with enterprise guarantees. Hooks are pluggable, but the scheduler below handles concurrency, failures, and shutdown:

State machine (Active → Draining → Closed)

Transitions via atomic CAS. DrainAndClose(timeout) waits for in-flight to finish before closing — safe for graceful SIGTERM.

Copy-on-Write snapshots

Each AddPre/AddPost/SwapConfig builds a new snapshot and CAS-swaps via atomic.Pointer. In-flight runs always see a consistent view; zero locks on the hot path.

Per-hook isolation

Every hook runs inside a wrapper that recovers panics, enforces a timeout (default 30s), and records failures in a per-hook circuit breaker (default 5 failures → open 30s).

Priority-based ordering

Hooks implementing Prioritized are ordered (lower first). Ties preserve registration order. Backward compat: hooks without Priority() default to 100.

Short-circuit sentinels

A PreHook can return ErrSkipExecution (cache hit → skip agent.Execute) or ErrSkipRemainingHooks (stop the phase). The pipeline synthesizes a result so PostHooks still run.

Hot reload

SwapConfig(cfg) atomically swaps config. In-flight runs keep the old config (correct — one turn under one config); new runs pick up the new one.

Pipeline metrics

5 collections under chatcli_quality_pipeline_*:

Metric	Type	Labels	Notes
`dispatch_total`	Counter	`outcome`	ok, exec_error, pre_short_circuit, bypass_disabled, draining, closed
`hook_duration_seconds`	Histogram	`hook`, `phase`	phase = pre\|post
`hook_errors_total`	Counter	`hook`, `reason`	returned_error, timeout, panic, circuit_open
`hook_circuit_state`	Gauge	`hook`	0=closed, 1=open, 2=half_open
`generation`	Gauge	—	Snapshot version — bumps on every registration/swap

Use generation to correlate dashboards with config changes:

# Pipeline change timeline
changes(chatcli_quality_pipeline_generation[1h])

# Worst-offender hooks
sort_desc(rate(chatcli_quality_pipeline_hook_errors_total[5m])) by (hook, reason)

Trigger matrix

Pattern	Slash	Env var	Default	Auto trigger
#1 ReAct	—	—	always on	always
#2 Plan-First	`/plan [agent\|coder\|preview\|dry] [task]`	`CHATCLI_QUALITY_PLAN_FIRST_MODE`	`auto`	complexity ≥ 6
#3 Reflexion	`/reflect <lesson>`	`CHATCLI_QUALITY_REFLEXION_ENABLED`	`on`	error, CoVe flagged, refine low
#4 HyDE	— (transparent)	`CHATCLI_QUALITY_HYDE_ENABLED`	`off`	every retrieval
#5 Refine	`/refine on\|off`	`CHATCLI_QUALITY_REFINE_ENABLED`	`off`	post-worker
#6 CoVe	`/verify on\|off`	`CHATCLI_QUALITY_VERIFY_ENABLED`	`off`	post-worker
#7 Reasoning	`/thinking on\|off`	`CHATCLI_QUALITY_REASONING_MODE`	`auto`	for AutoAgents

Override priority

For a given turn, the effort hint resolves in this order (later wins):

Skill frontmatter

effort: high in the activated skill’s frontmatter.

Agent default

E.g. PlannerAgent has embedded effort="high".

CHATCLI_QUALITY_REASONING_*

Auto-enable for agents in AutoAgents.

/thinking session override

Wins over everything above for the next turn.

For Refine / Verify / Reflexion hook enablement:

/config quality (env)

CHATCLI_QUALITY_REFINE_ENABLED, etc.

/refine and /verify session toggles

*bool override living on cli.qualityOverrides; overrides env for the session.

For Plan-First:

/plan one-shot flag

cli.pendingPlanFirst = true consumed on next dispatch.

CHATCLI_QUALITY_PLAN_FIRST_MODE + complexity

always ignores score; auto fires when ComplexityScore(task) >= threshold.

Cost and latency

Defaults are calibrated for steady-state identical to pre-pipeline chatcli. Expensive patterns (Refine, Verify, HyDE) start off; you opt in when the context justifies.

Pattern	Extra LLM calls per turn	Notes
ReAct	0 (already part of the loop)	—
Plan-First (auto)	+1 (planner) when triggered	Steps reuse the dispatcher
Reflexion	+1 (lesson gen), background	Never blocks the turn
HyDE 3a	+1 (hypothesis), cheap	200 token budget
HyDE 3b	+1 (query embed) + lazy backfill	embedding API ~$0.00002/1k tokens
Self-Refine	+N (one per pass, default 1)	Convergence cuts it short
CoVe	+1 (verifier) per call site	Internally N=3 questions
Reasoning auto	0 extra calls; +tokens on hosted thinking	Anthropic budget = 8k default

Observability

Every active pattern shows up in /config quality:

✨ AGENT HARNESS/QUALITY PIPELINE ────────────────────────
  CHATCLI_QUALITY_ENABLED         : enabled
  Hooks registered                : pre=0, post=3

  ── Self-Refine (#5)
  CHATCLI_QUALITY_REFINE_ENABLED  : enabled
  CHATCLI_QUALITY_REFINE_MAX_PASSES: 1
  ...

  ── RAG + HyDE (#4)
  CHATCLI_QUALITY_HYDE_ENABLED    : enabled
  CHATCLI_QUALITY_HYDE_USE_VECTORS: enabled
  CHATCLI_EMBED_PROVIDER          : bedrock
  CHATCLI_EMBED_MODEL             : amazon.titan-embed-text-v2:0
  Vector provider                : bedrock:amazon.titan-embed-text-v2:0
  Vector entries                 : 127

CHATCLI_EMBED_PROVIDER accepts voyage, openai, bedrock, or null. For Bedrock, ChatCLI reuses the same AWS chain as the chat (IAM, profile, SSO) — no extra API key. See RAG + HyDE for per-provider details.

Next steps

Tutorial: Plan-and-Solve

Start with the pattern with highest leverage on multi-step tasks.

Configure HyDE with vectors

Enable embeddings (Voyage, OpenAI, or Bedrock Titan/Cohere) for semantic retrieval.

Slash reference

/thinking, /plan, /refine, /verify, /reflect.

Full env var list

All CHATCLI_QUALITY_* and CHATCLI_EMBED_*.

​The seven patterns

#1 — ReAct

#2 — Plan-and-Solve / ReWOO

#3 — Reflexion

#4 — RAG + HyDE

#5 — Self-Refine

#6 — Chain-of-Verification

#7 — Reasoning Backbone

Configuration

​How the patterns connect

​Pipeline Architecture (engine)

​Pipeline metrics

​Trigger matrix

​Override priority

​Cost and latency

​Observability

​Next steps

Tutorial: Plan-and-Solve

Configure HyDE with vectors

Slash reference

Full env var list

The seven patterns

How the patterns connect

Pipeline Architecture (engine)

Pipeline metrics

Trigger matrix

Override priority

Cost and latency

Observability

Next steps