Skip to main content
Chain-of-Verification (Dhuliawala et al., 2023) is the canonical technique for reducing hallucination in LLM output. The flow: produce answer → generate verification questions about its claims → answer each question independently (without seeing the original answer) → reconcile discrepancies. ChatCLI implements via VerifierAgent (pure reasoning, zero tools) + VerifyHook (PostHook in the pipeline). When a discrepancy is detected, the verified_with_discrepancy flag is recorded in result.Metadata, activating Reflexion downstream.
CoVe is opt-in. With CHATCLI_QUALITY_VERIFY_ENABLED=false (default), zero overhead. When enabled, it adds +1 LLM call per use point with high effort (default effort="high").

VerifierAgent protocol

The model receives TASK + DRAFT and emits five blocks:
<status>verified-clean OR verified-with-corrections</status>

<questions>
- Q1: specific verifiable claim 1
- Q2: specific verifiable claim 2
- Q3: specific verifiable claim 3
</questions>

<answers>
- A1: independent answer to Q1 (without looking at DRAFT)
- A2: independent answer to Q2
- A3: independent answer to Q3
</answers>

<discrepancies>
- description of discrepancy, OR "none"
</discrepancies>

<final>
…answer ready to deliver to user (verbatim of DRAFT if verified-clean,
or rewritten if verified-with-corrections)…
</final>

Critical rule: INDEPENDENT answers

The protocol instructs the model to answer each Q<n> without referring to the DRAFT. This is the heart of CoVe — if independent answers contradict the draft, there’s a suspect claim.
Models that just parrot the draft in the answers defeat the pattern. The system prompt emphasizes this explicitly; use models capable of “self-distance” (Claude Sonnet, GPT-4+).

VerifyHook flow

1

Worker finishes

Any agent produces result.Output.
2

VerifyHook.PostRun

if result.Error != nil { return }
if !cfg.Verify.Enabled { return }
if !AppliesToAgent(agent, cfg.ExcludeAgents) { return }
if result.Output == "" { return }
3

Dispatch VerifierAgent

body := VerifyDirective + fmt.Sprintf(" [NUM_QUESTIONS=%d]\n", numQ) +
        "Task:\n" + task + "\n\n" +
        "Draft:\n" + result.Output
res := dispatch(ctx, AgentCall{Agent: verifier, Task: body})
4

ParseVerifierOutput

Extracts the 5 blocks. Tolerant to order, bullets with - or *.
5

HasDiscrepancy check

Status=="verified-with-corrections" OR Discrepancies!="none".
6

On discrepancy + RewriteOnDiscrepancy

result.SetMetadata("verified_with_discrepancy", "true")
result.SetMetadata("verifier_discrepancies", parsed.Discrepancies)
if cfg.RewriteOnDiscrepancy {
    result.Output = parsed.Final
}
7

Reflexion consumes the flag

In the same pipeline run, ReflexionHook.PostRun sees the metadata and, if OnHallucination=true, triggers lesson generation.

Exclude list (anti-recursion + non-textual agents)

ExcludeAgents: []string{"formatter", "deps", "shell", "refiner", "verifier"}
AgentReason
formatterOutput is formatted code, not claims
depsOutput is deterministic package list
shellOutput is stdout/stderr, not verifiable claims
refinerOutput already went through critique; verify would be redundant
verifierAnti-recursion — verifying the verifier creates a loop

/verify — session toggle

/verify on

Environment variables

Env varDefaultWhat it does
CHATCLI_QUALITY_VERIFY_ENABLEDfalseMaster switch
CHATCLI_QUALITY_VERIFY_NUM_QUESTIONS3How many verification questions
CHATCLI_QUALITY_VERIFY_REWRITEtrueRewrite output on discrepancy
CHATCLI_QUALITY_VERIFY_EXCLUDEformatter,deps,shell,refiner,verifierCSV of exclusions

Verifier override

# Use a stronger model just for verify
export CHATCLI_AGENT_VERIFIER_MODEL="claude-opus-4-7"
export CHATCLI_AGENT_VERIFIER_EFFORT="max"

# Or more questions for critical workflows
export CHATCLI_QUALITY_VERIFY_NUM_QUESTIONS=5

Example: catching API hallucination

Go’s http.Client has a DefaultTimeout field you can set globally. Just do http.DefaultClient.DefaultTimeout = 30 * time.Second in the program’s init.
Error: DefaultTimeout doesn’t exist. It’s Timeout (and there’s no “DefaultTimeout” in stdlib).
Flag recorded as result.Metadata["verified_with_discrepancy"]=true. If Reflexion is on (default), a lesson is generated in background:
LESSON: Citing Go stdlib API fields
MISTAKE: Invented "DefaultTimeout" field on http.Client
CORRECTION: Verify symbol exists via go doc or grep before citing
TRIGGER: hallucination

Interaction with Refine

When Refine + Verify are both enabled, order matters:
// cli/agent/quality/builder.go
if cfg.Refine.Enabled && deps.Dispatch != nil {
    p.AddPost(NewRefineHook(...))      // first
}
if cfg.Verify.Enabled && deps.Dispatch != nil {
    p.AddPost(NewVerifyHook(...))      // after refine
}
Refine improves stylistic quality first; Verify checks factual accuracy over the already-refined output. Reverse order (verify → refine) would work, but a refine rewrite could introduce unverified claims.
If you only enable one of them, prefer Verify for factual workflows (technical docs, API-heavy code) and Refine for stylistic workflows (summaries, reports).

Direct invocation

<agent_call agent="verifier" task="[VERIFY_ANSWER] Task: X Draft: Y" />
Or with direct tool-call to verifier in orchestrating critical steps:
# Chain: search → coder → verifier before user sees

Cost and latency

ConfigExtra callsLatency
NumQuestions=3 (default)+1 verifier call3-8s with Sonnet
NumQuestions=5+1 (same call, more questions)5-12s
NumQuestions=7++1 + model may break contractAvoid
CoVe with weak models generates shallow questions or “verifications” that just paraphrase the draft. Use Sonnet, Opus, or GPT-4+ as CHATCLI_AGENT_VERIFIER_MODEL.

See also

#3 Reflexion

Consumes the verified_with_discrepancy signal to generate lessons about hallucination.

#5 Self-Refine

Stylistic complement: Refine polishes, CoVe verifies factually.

Original paper (Dhuliawala et al.)

Chain-of-Verification Reduces Hallucination in Large Language Models

Configuration

Env vars and slashes in one place.