Skip to main content
Self-Refine turns raw worker output into polished versions via a deliberate critique → rewrite cycle. The RefinerAgent is a pure-reasoning worker (zero tool access) that operates on text only, and can be invoked directly by the orchestrator or automatically by RefineHook as post-processing.
Self-Refine is opt-in. With CHATCLI_QUALITY_REFINE_ENABLED=false (default), the RefineHook is not added to the pipeline and zero overhead is introduced.

RefinerAgent protocol

The model receives TASK + DRAFT and emits two XML-ish blocks:
<critique>
- Bullet 1: identified logical gap
- Bullet 2: vagueness or lack of examples
- Bullet 3: unsupported claim
</critique>

<revised>
…rewritten version addressing each critique bullet,
ready to ship to the user…
</revised>
System prompt rules:
  • Output ONLY the two blocks — nothing before, nothing after.
  • If draft is already excellent, <revised> repeats verbatim and <critique> says “no material issues”.
  • Never invents new requirements beyond the TASK.
  • Keeps the draft’s format (code → code, prose → prose).
The parser is tolerant: if the model violates the protocol, RefinerAgent.Execute returns the full response as output — Reflexion picks up the violation as a low-quality signal.

RefineHook flow

1

Worker finishes

Any agent (CoderAgent, FileAgent, …) produces result.Output.
2

RefineHook.PostRun is invoked

if result.Error != nil { return }  // errors go to Reflexion, not Refine
if !cfg.Refine.Enabled { return }
if !AppliesToAgent(agent, cfg.ExcludeAgents) { return }
if len(draft) < cfg.MinDraftBytes { return }  // skip short outputs
3

Multi-pass loop

For each pass up to MaxPasses:
  • Dispatch workers.AgentCall{Agent: refiner, Task: RefineDirective + "Task: ...\n\nDraft: ..."}
  • Receives res.Output with contents of <revised>
  • If convergedRefine(currentDraft, res.Output, EpsilonChars) → break
4

Replace result.Output

result.Output = currentDraft (only if it changed).

Convergence — semantic cascade

Self-Refine uses a char → Jaccard → embedding cascade to decide when to stop. The old char-level heuristic (convergedCharHeuristic, still available as fallback) only caught literal equality; the cascade catches same meaning in different words, which is the real case for most rewrites.
┌────────────────────────────────────────────────────────────┐
│  Pass N: new revision (next) vs current draft              │
└───────────────┬────────────────────────────────────────────┘

   1. Char scorer (μs)  ─────────── sim > 0.99?  ─► CONVERGED (identical)
                       │                     sim < 0.3?  ─► DIVERGED (obviously different)
                       ▼ borderline
   2. Jaccard scorer (ms) ────── sim > 0.95 + high conf? ─► CONVERGED

                       ▼ still borderline
   3. Embedding (ms + $) ──── sim > 0.92? ─► CONVERGED
                       │                  otherwise ─► KEEP REFINING

                       ▼ embedder down
   Fallback: Jaccard → Char (with DegradedFrom flag)

The three scorers

ScorerCostCatchesConfidence
CharμsLiteral equality, length deltaHigh at extremes, low in the middle
JaccardmsNormalized token sets (lowercase, EN/PT stop-words, no punctuation) — captures reorderingGrows with corpus size
Embedding100-500ms + $Cosine similarity via embedding.Provider (Voyage/OpenAI) — captures paraphrase + synonymsHigh, authoritative

Quality regression guard

Beyond detecting convergence, the hook compares each new revision against the original draft. If similarity starts dropping between passes (rewrite is drifting worse), it reverts to the best draft seen and sets refine_rolled_back=true metadata — a signal Reflexion consumes:
pass 0: original draft                         sim_best=1.0
pass 1: rewrite v1    sim vs original = 0.95  → better than original, keep
pass 2: rewrite v2    sim vs original = 0.50  → drop > 15% → REVERT to v1
                                                refine_rolled_back=true
This prevents the model from iterating toward a worse answer when critique over-interprets.

LRU cache with TTL

Embedding is expensive ($). The cascade caches vectors by sha256(text) in a bounded LRU (default 256 entries, 5min TTL) — during a multi-pass loop, the same string appears in consecutive comparisons and hits skip duplicate calls.

Per-scorer circuit breaker

If the embedder returns 3 consecutive errors (provider 429/503/timeout), the breaker opens for 30s — the cascade gracefully degrades to Jaccard with DegradedFrom="embedding_unavailable" in the Score. Zero refine stall.

Strict vs permissive mode

ModeEmbedder downBehavior
permissive (default)Degrades to JaccardMarks DegradedFrom in the trail, continues
strictRefuses convergenceTreats embedding as authoritative; without it, Converged=false
Strict is for high-stakes workloads (legal contracts, security code) where a false-positive convergence = shipping unpolished output.

Legacy fallback

When no checker is wired (or CONVERGENCE_ENABLED=false), falls back to the original char-level heuristic:
func convergedCharHeuristic(a, b string, epsilon int) bool {
    if abs(len(a) - len(b)) > epsilon { return false }
    mismatches := 0
    for i := 0; i < min(len(a), len(b)); i++ {
        if a[i] != b[i] {
            mismatches++
            if mismatches > epsilon { return false }
        }
    }
    return true
}
With EpsilonChars=50 (default), stops when two passes differ by less than 50 chars. Cheap but blind to paraphrase — hence the cascade as enterprise default.

Exclude lists (anti-recursion and mechanical agents)

Default ExcludeAgents:
ExcludeAgents: []string{"formatter", "deps", "refiner", "verifier"}
AgentReason
formatterMechanical output (formatted file), refine doesn’t add value
depsOutput is deterministic interpretation of go list, npm ls, etc.
refinerAnti-recursion: refining the refiner’s output creates an infinite loop
verifierSame reason — verifier already delivers polished output
Add more via env:
export CHATCLI_QUALITY_REFINE_EXCLUDE="formatter,deps,refiner,verifier,shell,git"

/refine — session toggle

Instead of editing env vars and restarting, use the slash:
/refine on
# → self-refine ON for this session
The session override lives in cli.qualityOverrides.Refine as *bool:
  • nil → defer to /config quality
  • &true → force on
  • &false → force off
It’s applied on top of env when AgentMode.Run() builds its qualityConfig.

Environment variables

Basics

Env varDefaultWhat it does
CHATCLI_QUALITY_REFINE_ENABLEDfalseMaster switch
CHATCLI_QUALITY_REFINE_MAX_PASSES1Hard cap on passes (recommend 1-2)
CHATCLI_QUALITY_REFINE_MIN_BYTES200Don’t refine outputs smaller than this
CHATCLI_QUALITY_REFINE_EPSILON50Char-level fallback threshold
CHATCLI_QUALITY_REFINE_EXCLUDEformatter,deps,refiner,verifierCSV of agents that are NOT refined

Semantic convergence cascade

Env varDefaultEffect
CHATCLI_QUALITY_REFINE_CONVERGENCE_ENABLEDtrueCascade master switch. false = char heuristic only
CHATCLI_QUALITY_REFINE_CONVERGENCE_EMBEDDINGfalseInclude embedding scorer — opt-in because it costs $
CHATCLI_QUALITY_REFINE_CONVERGENCE_STRICTfalseStrict mode: refuse convergence without embedding
CHATCLI_QUALITY_REFINE_CONVERGENCE_CHAR_HIGH0.99Sim ≥ X on char → short-circuit CONVERGED
CHATCLI_QUALITY_REFINE_CONVERGENCE_CHAR_LOW0.3Sim < X on char → short-circuit DIVERGED
CHATCLI_QUALITY_REFINE_CONVERGENCE_JACCARD_HIGH0.95Sim ≥ X on Jaccard (confidence ≥ 0.6) → CONVERGED
CHATCLI_QUALITY_REFINE_CONVERGENCE_EMBEDDING_SIM0.92Final embedding cosine threshold
CHATCLI_QUALITY_REFINE_CONVERGENCE_CACHE_SIZE256LRU cache size
CHATCLI_QUALITY_REFINE_CONVERGENCE_CACHE_TTL_MIN5Cache TTL in minutes
CHATCLI_QUALITY_REFINE_CONVERGENCE_BREAKER_THRESHOLD3Consecutive embedder failures before breaker opens
To enable embedding: CHATCLI_QUALITY_REFINE_CONVERGENCE_EMBEDDING=1 + any embeddings provider (CHATCLI_EMBED_PROVIDER=voyage / openai / bedrock). The cascade reuses the same provider as HyDE — zero duplicate infra. See RAG + HyDE for per-provider details.

Refiner agent override

RefinerAgent has defaults model="" and effort="medium". Override:
# Use Haiku for cheap refine
export CHATCLI_AGENT_REFINER_MODEL="claude-haiku-4-5"
export CHATCLI_AGENT_REFINER_EFFORT="low"

# Or gpt-5 with high effort for max quality
export CHATCLI_AGENT_REFINER_MODEL="gpt-5"
export CHATCLI_AGENT_REFINER_EFFORT="high"

Example: refine on documentation response

# HTTP Client

The HTTP client is used for making requests. You can use it with various providers.

## Usage

Just call the Do method with a request and you get a response back.
Quality judgment: vague, no example, dismissive “just”.

Direct invocation by the orchestrator

Beyond the automatic hook, the orchestrator can call the refiner via <agent_call>:
<agent_call agent="refiner" task="Polish this draft doc section: [paste draft here]" />
Useful when:
  • You want to refine a specific output without enabling the global hook.
  • The task itself is “improve this text”.
  • Complex chains: coder writes → reviewer analyzes → refiner polishes the analysis before delivery.

Cost and latency

ConfigExtra LLM calls per turnTypical latency
MaxPasses=1 (default)+11-3s with Haiku, 3-8s with Sonnet
MaxPasses=2+1 to +2 (convergence may stop at 1)up to 2x
MaxPasses=3+Rarely converges → expensiveAvoid
Refine shines in writing tasks: documentation, summaries, reports. Less useful in tool-heavy workflows (shell, git, file) because outputs are already mechanical.

See also

#6 CoVe

Chain-of-Verification complements refine with factual checking.

#3 Reflexion

If refine flags low-quality across multiple turns, Reflexion persists the lesson.

ReviewerAgent

Pre-pipeline agent that does code review. Refiner is complementary: one analyzes, the other rewrites.

Configuration

All CHATCLI_QUALITY_REFINE_* in one place.