#5 Self-Refine — Critique and Rewrite

Self-Refine turns raw worker output into polished versions via a deliberate critique → rewrite cycle. The RefinerAgent is a pure-reasoning worker (zero tool access) that operates on text only, and can be invoked directly by the orchestrator or automatically by RefineHook as post-processing.

Self-Refine is opt-in. With CHATCLI_QUALITY_REFINE_ENABLED=false (default), the RefineHook is not added to the pipeline and zero overhead is introduced.

RefinerAgent protocol

The model receives TASK + DRAFT and emits two XML-ish blocks:

<critique>
- Bullet 1: identified logical gap
- Bullet 2: vagueness or lack of examples
- Bullet 3: unsupported claim
</critique>

<revised>
…rewritten version addressing each critique bullet,
ready to ship to the user…
</revised>

System prompt rules:

Output ONLY the two blocks — nothing before, nothing after.
If draft is already excellent, <revised> repeats verbatim and <critique> says “no material issues”.
Never invents new requirements beyond the TASK.
Keeps the draft’s format (code → code, prose → prose).

The parser is tolerant: if the model violates the protocol, RefinerAgent.Execute returns the full response as output — Reflexion picks up the violation as a low-quality signal.

RefineHook flow

Worker finishes

Any agent (CoderAgent, FileAgent, …) produces result.Output.

RefineHook.PostRun is invoked

if result.Error != nil { return }  // errors go to Reflexion, not Refine
if !cfg.Refine.Enabled { return }
if !AppliesToAgent(agent, cfg.ExcludeAgents) { return }
if len(draft) < cfg.MinDraftBytes { return }  // skip short outputs

Multi-pass loop

For each pass up to MaxPasses:

Dispatch workers.AgentCall{Agent: refiner, Task: RefineDirective + "Task: ...\n\nDraft: ..."}
Receives res.Output with contents of <revised>
If convergedRefine(currentDraft, res.Output, EpsilonChars) → break

Replace result.Output

result.Output = currentDraft (only if it changed).

Convergence — semantic cascade

Self-Refine uses a char → Jaccard → embedding cascade to decide when to stop. The old char-level heuristic (convergedCharHeuristic, still available as fallback) only caught literal equality; the cascade catches same meaning in different words, which is the real case for most rewrites.

┌────────────────────────────────────────────────────────────┐
│  Pass N: new revision (next) vs current draft              │
└───────────────┬────────────────────────────────────────────┘
                ▼
   1. Char scorer (μs)  ─────────── sim > 0.99?  ─► CONVERGED (identical)
                       │                     sim < 0.3?  ─► DIVERGED (obviously different)
                       ▼ borderline
   2. Jaccard scorer (ms) ────── sim > 0.95 + high conf? ─► CONVERGED
                       │
                       ▼ still borderline
   3. Embedding (ms + $) ──── sim > 0.92? ─► CONVERGED
                       │                  otherwise ─► KEEP REFINING
                       │
                       ▼ embedder down
   Fallback: Jaccard → Char (with DegradedFrom flag)

The three scorers

Scorer	Cost	Catches	Confidence
Char	μs	Literal equality, length delta	High at extremes, low in the middle
Jaccard	ms	Normalized token sets (lowercase, EN/PT stop-words, no punctuation) — captures reordering	Grows with corpus size
Embedding	100-500ms + $	Cosine similarity via `embedding.Provider` (Voyage/OpenAI) — captures paraphrase + synonyms	High, authoritative

Quality regression guard

Beyond detecting convergence, the hook compares each new revision against the original draft. If similarity starts dropping between passes (rewrite is drifting worse), it reverts to the best draft seen and sets refine_rolled_back=true metadata — a signal Reflexion consumes:

pass 0: original draft                         sim_best=1.0
pass 1: rewrite v1    sim vs original = 0.95  → better than original, keep
pass 2: rewrite v2    sim vs original = 0.50  → drop > 15% → REVERT to v1
                                                refine_rolled_back=true

This prevents the model from iterating toward a worse answer when critique over-interprets.

LRU cache with TTL

Embedding is expensive ($). The cascade caches vectors by sha256(text) in a bounded LRU (default 256 entries, 5min TTL) — during a multi-pass loop, the same string appears in consecutive comparisons and hits skip duplicate calls.

Per-scorer circuit breaker

If the embedder returns 3 consecutive errors (provider 429/503/timeout), the breaker opens for 30s — the cascade gracefully degrades to Jaccard with DegradedFrom="embedding_unavailable" in the Score. Zero refine stall.

Strict vs permissive mode

Mode	Embedder down	Behavior
permissive (default)	Degrades to Jaccard	Marks DegradedFrom in the trail, continues
strict	Refuses convergence	Treats embedding as authoritative; without it, `Converged=false`

Strict is for high-stakes workloads (legal contracts, security code) where a false-positive convergence = shipping unpolished output.

Legacy fallback

When no checker is wired (or CONVERGENCE_ENABLED=false), falls back to the original char-level heuristic:

func convergedCharHeuristic(a, b string, epsilon int) bool {
    if abs(len(a) - len(b)) > epsilon { return false }
    mismatches := 0
    for i := 0; i < min(len(a), len(b)); i++ {
        if a[i] != b[i] {
            mismatches++
            if mismatches > epsilon { return false }
        }
    }
    return true
}

With EpsilonChars=50 (default), stops when two passes differ by less than 50 chars. Cheap but blind to paraphrase — hence the cascade as enterprise default.

Exclude lists (anti-recursion and mechanical agents)

Default ExcludeAgents:

ExcludeAgents: []string{"formatter", "deps", "refiner", "verifier"}

Agent	Reason
formatter	Mechanical output (formatted file), refine doesn’t add value
deps	Output is deterministic interpretation of `go list`, `npm ls`, etc.
refiner	Anti-recursion: refining the refiner’s output creates an infinite loop
verifier	Same reason — verifier already delivers polished output

Add more via env:

export CHATCLI_QUALITY_REFINE_EXCLUDE="formatter,deps,refiner,verifier,shell,git"

`/refine` — session toggle

Instead of editing env vars and restarting, use the slash:

/refine on
# → self-refine ON for this session

The session override lives in cli.qualityOverrides.Refine as *bool:

nil → defer to /config quality
&true → force on
&false → force off

It’s applied on top of env when AgentMode.Run() builds its qualityConfig.

Environment variables

Basics

Env var	Default	What it does
`CHATCLI_QUALITY_REFINE_ENABLED`	`false`	Master switch
`CHATCLI_QUALITY_REFINE_MAX_PASSES`	`1`	Hard cap on passes (recommend 1-2)
`CHATCLI_QUALITY_REFINE_MIN_BYTES`	`200`	Don’t refine outputs smaller than this
`CHATCLI_QUALITY_REFINE_EPSILON`	`50`	Char-level fallback threshold
`CHATCLI_QUALITY_REFINE_EXCLUDE`	`formatter,deps,refiner,verifier`	CSV of agents that are NOT refined

Semantic convergence cascade

Env var	Default	Effect
`CHATCLI_QUALITY_REFINE_CONVERGENCE_ENABLED`	`true`	Cascade master switch. `false` = char heuristic only
`CHATCLI_QUALITY_REFINE_CONVERGENCE_EMBEDDING`	`false`	Include embedding scorer — opt-in because it costs $
`CHATCLI_QUALITY_REFINE_CONVERGENCE_STRICT`	`false`	Strict mode: refuse convergence without embedding
`CHATCLI_QUALITY_REFINE_CONVERGENCE_CHAR_HIGH`	`0.99`	Sim ≥ X on char → short-circuit CONVERGED
`CHATCLI_QUALITY_REFINE_CONVERGENCE_CHAR_LOW`	`0.3`	Sim < X on char → short-circuit DIVERGED
`CHATCLI_QUALITY_REFINE_CONVERGENCE_JACCARD_HIGH`	`0.95`	Sim ≥ X on Jaccard (confidence ≥ 0.6) → CONVERGED
`CHATCLI_QUALITY_REFINE_CONVERGENCE_EMBEDDING_SIM`	`0.92`	Final embedding cosine threshold
`CHATCLI_QUALITY_REFINE_CONVERGENCE_CACHE_SIZE`	`256`	LRU cache size
`CHATCLI_QUALITY_REFINE_CONVERGENCE_CACHE_TTL_MIN`	`5`	Cache TTL in minutes
`CHATCLI_QUALITY_REFINE_CONVERGENCE_BREAKER_THRESHOLD`	`3`	Consecutive embedder failures before breaker opens

To enable embedding: CHATCLI_QUALITY_REFINE_CONVERGENCE_EMBEDDING=1 + any embeddings provider (CHATCLI_EMBED_PROVIDER=voyage / openai / bedrock). The cascade reuses the same provider as HyDE — zero duplicate infra. See RAG + HyDE for per-provider details.

Refiner agent override

RefinerAgent has defaults model="" and effort="medium". Override:

# Use Haiku for cheap refine
export CHATCLI_AGENT_REFINER_MODEL="claude-haiku-4-5"
export CHATCLI_AGENT_REFINER_EFFORT="low"

# Or gpt-5 with high effort for max quality
export CHATCLI_AGENT_REFINER_MODEL="gpt-5"
export CHATCLI_AGENT_REFINER_EFFORT="high"

Example: refine on documentation response

Draft (CoderAgent output)
Critique
Revised

# HTTP Client

The HTTP client is used for making requests. You can use it with various providers.

## Usage

Just call the Do method with a request and you get a response back.

Quality judgment: vague, no example, dismissive “just”.

- "used for making requests" — tautologic; remove.
- No code example; docs readers need a snippet.
- "various providers" — list which.
- "Just call" minimizes real complexity (error handling, context, timeouts).

# HTTP Client

The HTTP client wraps `net/http.Client` with chatcli-specific
timeouts, retry policy, and provider-aware header injection. It is
used internally by every LLM provider (OpenAI, Anthropic, Voyage)
and by the MCP transport.

## Usage

```go
resp, err := httpclient.Do(ctx, req)
if err != nil {
    return fmt.Errorf("request failed: %w", err)
}
defer resp.Body.Close()
```

The `ctx` parameter is mandatory — requests without a cancellable
context will be rejected with `ErrContextRequired`.

Quality: concrete, code-ready example, explicit ctx contract.

Direct invocation by the orchestrator

Beyond the automatic hook, the orchestrator can call the refiner via <agent_call>:

<agent_call agent="refiner" task="Polish this draft doc section: [paste draft here]" />

Useful when:

You want to refine a specific output without enabling the global hook.
The task itself is “improve this text”.
Complex chains: coder writes → reviewer analyzes → refiner polishes the analysis before delivery.

Cost and latency

Config	Extra LLM calls per turn	Typical latency
`MaxPasses=1` (default)	+1	1-3s with Haiku, 3-8s with Sonnet
`MaxPasses=2`	+1 to +2 (convergence may stop at 1)	up to 2x
`MaxPasses=3+`	Rarely converges → expensive	Avoid

Refine shines in writing tasks: documentation, summaries, reports. Less useful in tool-heavy workflows (shell, git, file) because outputs are already mechanical.

#6 CoVe

Chain-of-Verification complements refine with factual checking.

#3 Reflexion

If refine flags low-quality across multiple turns, Reflexion persists the lesson.

ReviewerAgent

Pre-pipeline agent that does code review. Refiner is complementary: one analyzes, the other rewrites.

Configuration

All CHATCLI_QUALITY_REFINE_* in one place.

​RefinerAgent protocol

​RefineHook flow

​Convergence — semantic cascade

​The three scorers

​Quality regression guard

​LRU cache with TTL

​Per-scorer circuit breaker

​Strict vs permissive mode

​Legacy fallback

​Exclude lists (anti-recursion and mechanical agents)

​/refine — session toggle

​Environment variables

​Basics

​Semantic convergence cascade

​Refiner agent override

​Example: refine on documentation response

​Direct invocation by the orchestrator

​Cost and latency

​See also

#6 CoVe

#3 Reflexion

ReviewerAgent

Configuration

RefinerAgent protocol

RefineHook flow

Convergence — semantic cascade

The three scorers

Quality regression guard

LRU cache with TTL

Per-scorer circuit breaker

Strict vs permissive mode

Legacy fallback

Exclude lists (anti-recursion and mechanical agents)

`/refine` — session toggle

Environment variables

Basics

Semantic convergence cascade

Refiner agent override

Example: refine on documentation response

Direct invocation by the orchestrator

Cost and latency

See also