Skip to main content
Reflexion closes the learning loop: when an agent fails or produces low-quality output, instead of losing the experience, the pipeline generates a structured Lesson and persists it to long-term memory. On the next similar task, that lesson naturally surfaces via RAG+HyDE.
Reflexion is the only post-hook on by default — because it only fires in exceptional conditions (error, discrepancy) and lesson generation never blocks the user turn.
Durable mode (default since Apr 2026): triggers flow through a WAL-backed queue with worker pool and dead-letter queue. Lessons survive process crashes via replay on next boot. See Durable Queue.

What is a Lesson

A Lesson is a four-line record:
type Lesson struct {
    Situation  string   // "When editing large Go files..."
    Mistake    string   // "Tried to rewrite the whole file at once"
    Correction string   // "Use Edit tool with specific old_string/new_string"
    Tags       []string // ["go", "edit-file", "large-file", "reflexion"]
    Trigger    string   // "error" | "hallucination" | "low_quality" | "manual"
    CreatedAt  time.Time
}
When persisted as memory.Fact, Content becomes:
LESSON: When editing large Go files
MISTAKE: Tried to rewrite the whole file at once
CORRECTION: Use Edit tool with specific old_string/new_string
TRIGGER: error
The Fact category is lesson and tags include reflexion + trigger:<x> + domain-specific tags. This enables precise queries: “show me all lessons about edit-file” becomes a regular memory search.

Four triggers

if cfg.OnError && result.Error != nil {
    return "error"
}
The worker returned Error != nil. Examples: timeout, invalid tool call, provider crash. Default: ON.

Flow — durable mode (default)

With CHATCLI_QUALITY_REFLEXION_QUEUE_ENABLED=true (default), triggers flow through a persistent queue. The hook never blocks the turn and the process can crash without losing the lesson:
1

PostRun inspects the trigger

ReflexionHook.PostRun(ctx, hc, result) looks at result.Metadata + result.Error — if no gate matches, returns in μs.
2

WAL Append (synchronous, sub-ms)

The hook calls enqueuer.Enqueue(req). The Runner computes JobID = sha256(task|trigger|attempt)[:16], writes a record to the WAL (~/.chatcli/reflexion/wal/<id>.wal) via tmp → fsync → atomic rename → dir fsync, then pushes in-memory.
3

Immediate return to the pipeline

PostRun returns nil; the user’s turn continues without waiting. Added latency is the fsync (typically < 1 ms).
4

Worker pool processes async

One of N workers (default 2) dequeues, calls GenerateLesson with per-job timeout (default 2 min), and persists to memory.Fact unless the LLM emits <skip>.
5

Outcome classification

Success or Skipped → ACK (delete WAL record). Transient error (timeout, 429/503) → reschedule with exponential backoff + jitter. Permanent (parser error) → move to DLQ immediately.
6

Replay on boot

Next session, Runner.Replay() runs async and re-queues every pending record from the WAL (discarding those older than StaleAfter, default 7 days).
Observable turn latency: a local fsync on SSD is typically < 1 ms. Lesson generation (LLM call) happens after the turn responds — users never wait.

Fallback: legacy mode (detached goroutine)

If CHATCLI_QUALITY_REFLEXION_QUEUE_ENABLED=false, the hook reverts to the original behavior:
go h.runReflexion(context.Background(), req)  // fire-and-forget
Zero filesystem dependency, but in-flight lessons vanish if the process is killed. Kept for backward compatibility and for users who prefer simplicity over durability.

Durable Queue — WAL + Worker Pool + DLQ

The queue is implemented in cli/agent/quality/lessonq/ with enterprise guarantees:

WAL (Write-Ahead Log)

Each pending lesson is a .wal file in ~/.chatcli/reflexion/wal/ — one per Job ID. Binary layout:
[4B magic 'LSN1'][4B length BE][4B CRC32 payload][N bytes JSON payload][4B CRC32 trailer]
  • Double CRC detects torn writes (crash mid-fsync). Corrupt records are discarded on replay + chatcli_lessonq_wal_corruption_total increments.
  • Atomic rename: write to <id>.tmp.<pid>.<seq> → fsync → rename → dir fsync. A reader never sees a partial record.
  • O(1) ACK: a single unlink removes the record. No background compaction.

Worker Pool

Queue (min-heap by NextAttemptAt)

      ├─► Worker 1 ─► GenerateLesson ─► persist ─► ACK
      ├─► Worker 2 ─► GenerateLesson ─► persist ─► ACK
      └─► Worker N ─► ...
Each worker:
  • Blocking Dequeue (waits until NextAttemptAt ≤ now).
  • Bounded per-job timeout (doesn’t inherit turn ctx — reflexion outlives the turn by design).
  • Panic recovery: if the processor panics, goes straight to DLQ (retrying a bug loops).
  • Emits chatcli_lessonq_processing_duration_seconds{outcome}.

Dead Letter Queue

Permanent failures or retry exhaustion go to ~/.chatcli/reflexion/dlq/ (same WAL format, read-only to the process). Operator inspects and decides:
/reflect failed              # list with last error
/reflect retry <job-id>      # re-queue (resets Attempts=0)
/reflect purge <job-id>      # remove permanently

Retry with Jitter

Transient errors (ctx timeout, provider 429/503, temp fs error) become reschedules:
delay = InitialDelay × Multiplier^(attempt-1)
delay = min(delay, MaxDelay)
delay = delay × uniform(1-JitterFraction, 1+JitterFraction)
Defaults: 1s initial, 5min cap, 2.0 multiplier, ±20% jitter, 5 attempts. Full jitter prevents thundering herd when the provider recovers.

Idempotency

JobID is content-addressed: sha256(normalized(task) | trigger | attempt | outcome)[:16]. Re-triggering the same situation while the job is in-flight is a no-op (WAL exists → Runner skips queue insert). Whitespace is normalized to avoid inflation from trivial churn.

Drain + Graceful Shutdown

On exit (cli.cleanup()), the Runner enters DrainAndShutdown(30s):
  1. Queue closes — no new dequeues.
  2. Workers finish in-flight (or get cancelled on timeout).
  3. WAL/DLQ close.
Jobs still queued survive in the WAL and reprocess on next boot. Zero data loss on SIGTERM or kill -9.

/reflect — Commands

/reflect                     # queue depth + DLQ size + subcommands hint
All subcommands have Tab autocomplete. /reflect retry and /reflect purge list live DLQ IDs with task preview + last error.

Files and layout

~/.chatcli/reflexion/
├── wal/                          # active queue (pending + in-flight)
│   ├── a3f8...bc.wal            # one file per Job ID
│   └── ...
└── dlq/                          # dead letter queue (permanent failures)
    ├── 9e2c...7a.wal
    └── ...
Path configurable via CHATCLI_QUALITY_REFLEXION_QUEUE_BASE_DIR (default: <workspace>/.chatcli/reflexion).
Operators can ls the directory for quick triage without special tools. Each record is JSON inside the binary framing — xxd + the lessonq protocol docs help in forensics.

Lesson generator protocol

The system prompt instructs the model to be general, not one-off:
Rules:
- A "lesson" must be GENERAL enough to apply next time a similar task
  comes up — not one-off and not a play-by-play.
- If there is genuinely nothing to learn (e.g. the task was trivial and
  the failure was a transient network blip), reply with exactly:
  <skip>nothing actionable</skip>
- Otherwise emit ALL of the following blocks. Keep each to ONE line.
- "tags" is a comma-separated list of 2-5 short keywords (lowercase,
  hyphenated if needed) that future similar tasks will likely contain.

OUTPUT:
<situation>brief description of when this lesson applies</situation>
<mistake>what went wrong this time</mistake>
<correction>what to do differently next time</correction>
<tags>tag1, tag2, tag3</tags>
The <skip> block exists precisely to avoid memory pollution with “lessons” from transient failures. The model can refuse to generate a lesson at zero persistence cost.

/reflect — manual path without LLM

When you know the lesson and don’t need an LLM distilling:
/reflect when editing large Go files use Edit, not full rewrite
Goes straight into memory.Fact:
LESSON: when editing large Go files use Edit, not full rewrite
MISTAKE: (user-supplied lesson; no automatic mistake detection)
CORRECTION: when editing large Go files use Edit, not full rewrite
TRIGGER: manual
Generated tags: ["reflexion", "trigger:manual", "user-supplied"].
The manual path does not make an LLM call — it’s cheap, synchronous, and ideal for capturing learnings during a session.

How the lesson “comes back”

Once persisted, the lesson is a regular fact in the index. It surfaces via:
  1. Hint-based retrieval: if the next task mentions keywords in Tags, the relevance-based scorer surfaces it.
  2. HyDE amplifies: with CHATCLI_QUALITY_HYDE_ENABLED=true, the generated hypothesis covers similar concepts, increasing match chance.
  3. Vector search: with embeddings configured, the lesson is searched by cosine proximity.
The next turn’s system prompt contains the ## Long-term Memory section with the lesson text, and the model has all the cues to not repeat the mistake.

Environment variables

Gates (when to fire)

Env varDefaultWhat it does
CHATCLI_QUALITY_REFLEXION_ENABLEDtrueMaster switch
CHATCLI_QUALITY_REFLEXION_ON_ERRORtrueFire on tool error
CHATCLI_QUALITY_REFLEXION_ON_HALLUCINATIONtrueFire on verified_with_discrepancy
CHATCLI_QUALITY_REFLEXION_ON_LOW_QUALITYfalseFire on refine_low_quality
CHATCLI_QUALITY_REFLEXION_PERSISTtrueWrite to memory.Fact (false = log-only)

Durable queue (WAL + worker pool + DLQ)

Env varDefaultEffect
CHATCLI_QUALITY_REFLEXION_QUEUE_ENABLEDtrueQueue master switch. false falls back to legacy (detached goroutine)
CHATCLI_QUALITY_REFLEXION_QUEUE_WORKERS2Worker pool size. Reflexion is I/O-bound on the LLM call
CHATCLI_QUALITY_REFLEXION_QUEUE_CAPACITY1000Max in-memory depth before overflow policy kicks in
CHATCLI_QUALITY_REFLEXION_QUEUE_DROP_OLDESTfalseOverflow: true drop oldest; false block with timeout
CHATCLI_QUALITY_REFLEXION_QUEUE_BLOCK_TIMEOUT5sHow long Enqueue waits when full (if DROP_OLDEST=false)
CHATCLI_QUALITY_REFLEXION_QUEUE_MAX_ATTEMPTS5Total retries before moving to DLQ
CHATCLI_QUALITY_REFLEXION_QUEUE_INITIAL_DELAY1sFirst retry delay
CHATCLI_QUALITY_REFLEXION_QUEUE_MAX_DELAY5mCap on exponential retry
CHATCLI_QUALITY_REFLEXION_QUEUE_JITTER0.2Fractional jitter ([0, 0.5]) — AWS-style full jitter
CHATCLI_QUALITY_REFLEXION_QUEUE_JOB_TIMEOUT2mPer-processor-call timeout (LLM + persist)
CHATCLI_QUALITY_REFLEXION_QUEUE_STALE_AFTER168hWAL records older than this are discarded on replay (7 days)
CHATCLI_QUALITY_REFLEXION_QUEUE_BASE_DIR<workspace>/.chatcli/reflexionOverride of the root dir (WAL + DLQ)

Prometheus metrics

The queue emits 10 metrics under chatcli_lessonq_*:
MetricTypeLabelsMeaning
enqueue_totalCounteroutcomeaccepted, rejected_full, deduped, dropped_oldest
queue_depthGaugeIn-memory pending jobs
processing_duration_secondsHistogramoutcomedequeue→outcome time
attempts_totalCounteroutcomesuccess, skipped, transient, permanent
retry_totalCounterattemptretries bucketed by attempt number
dlq_sizeGaugeJobs in DLQ
wal_segmentsGaugeActive .wal files
wal_corruption_totalCounterRecords rejected for CRC mismatch/torn write
stale_discarded_totalCounterRecords dropped at replay due to age
persist_failures_totalCountermemory.Fact callback failures

Full cycle example

1

User asks for a task that fails

/coder refactor pkg/engine to extract Close method
2

CoderAgent tries full rewrite

File has 2000 lines, provider responds with timeout.
3

PostRun detects result.Error != nil

OnError trigger matched.
4

goroutine: GenerateLesson

Model emits:
<situation>Refactoring large Go files (>1000 lines)</situation>
<mistake>Attempted full rewrite via @coder write</mistake>
<correction>Use @coder patch or Edit tool for surgical changes</correction>
<tags>go, refactor, large-file, edit-tool</tags>
5

Persists in memory.Fact

Category=lesson, workspace=current project.
6

Next week, user asks for a similar refactor

/coder refactor pkg/auth/manager.go split into smaller files
7

RAG+HyDE brings the lesson

Tags refactor + large-file match. Lesson appears in the system prompt.
8

Coder picks the right approach from the start

Emits multiple @coder patch instead of write. Task done without timeout.

Inspect stored lessons

# Lessons already persisted (materialized to memory.Fact)
/memory longterm | grep -A3 "^LESSON:"
cat ~/.chatcli/memory/memory_index.json | jq '.[] | select(.category=="lesson")'

# Durable queue — live pending + DLQ
/reflect list               # pending + DLQ
/reflect failed             # DLQ only (triage)
/config quality             # hook state + queue depth + dlq size

Useful Prometheus snapshots

# General queue health
chatcli_lessonq_queue_depth
chatcli_lessonq_dlq_size
sum(rate(chatcli_lessonq_attempts_total[5m])) by (outcome)

# Regression detection: DLQ growing without new success
rate(chatcli_lessonq_attempts_total{outcome="permanent"}[5m]) > 0

# Alert on WAL corruption (signal of unstable fs)
increase(chatcli_lessonq_wal_corruption_total[1h]) > 0

# Processor percentile latency
histogram_quantile(0.95,
  rate(chatcli_lessonq_processing_duration_seconds_bucket[5m]))

Legacy inspection (pre-queue)

# All lessons (same as above, shown for backward-compat)
/memory longterm | grep -A3 "^LESSON:"
cat ~/.chatcli/memory/memory_index.json | jq '.[] | select(.category=="lesson")'

# Or via /config
/config quality
# → shows total registered post-hooks (reflexion appears if Enabled=true)

See also

#4 RAG + HyDE

How lessons are retrieved in future tasks via semantic retrieval.

#6 CoVe

The verifier generates the verified_with_discrepancy signal that Reflexion consumes.

Bootstrap Memory

The layer underneath: how memory.Fact is populated and maintained.

Memory Commands

/memory load, /memory show, /memory longterm.