Reflexion is the only post-hook on by default — because it only fires in exceptional conditions (error, discrepancy) and lesson generation never blocks the user turn.
What is a Lesson
ALesson is a four-line record:
memory.Fact, Content becomes:
lesson and tags include reflexion + trigger:<x> + domain-specific tags. This enables precise queries: “show me all lessons about edit-file” becomes a regular memory search.
Four triggers
- OnError
- OnHallucination
- OnLowQuality
- Manual via /reflect
Error != nil. Examples: timeout, invalid tool call, provider crash.
Default: ON.Flow — durable mode (default)
WithCHATCLI_QUALITY_REFLEXION_QUEUE_ENABLED=true (default), triggers flow through a persistent queue. The hook never blocks the turn and the process can crash without losing the lesson:
PostRun inspects the trigger
ReflexionHook.PostRun(ctx, hc, result) looks at result.Metadata + result.Error — if no gate matches, returns in μs.WAL Append (synchronous, sub-ms)
The hook calls
enqueuer.Enqueue(req). The Runner computes JobID = sha256(task|trigger|attempt)[:16], writes a record to the WAL (~/.chatcli/reflexion/wal/<id>.wal) via tmp → fsync → atomic rename → dir fsync, then pushes in-memory.Immediate return to the pipeline
PostRun returns nil; the user’s turn continues without waiting. Added latency is the fsync (typically < 1 ms).
Worker pool processes async
One of N workers (default 2) dequeues, calls
GenerateLesson with per-job timeout (default 2 min), and persists to memory.Fact unless the LLM emits <skip>.Outcome classification
Success or Skipped → ACK (delete WAL record). Transient error (timeout, 429/503) → reschedule with exponential backoff + jitter. Permanent (parser error) → move to DLQ immediately.
Fallback: legacy mode (detached goroutine)
IfCHATCLI_QUALITY_REFLEXION_QUEUE_ENABLED=false, the hook reverts to the original behavior:
Durable Queue — WAL + Worker Pool + DLQ
The queue is implemented incli/agent/quality/lessonq/ with enterprise guarantees:
WAL (Write-Ahead Log)
Each pending lesson is a.wal file in ~/.chatcli/reflexion/wal/ — one per Job ID. Binary layout:
- Double CRC detects torn writes (crash mid-fsync). Corrupt records are discarded on replay +
chatcli_lessonq_wal_corruption_totalincrements. - Atomic rename: write to
<id>.tmp.<pid>.<seq>→ fsync → rename → dir fsync. A reader never sees a partial record. - O(1) ACK: a single
unlinkremoves the record. No background compaction.
Worker Pool
- Blocking Dequeue (waits until NextAttemptAt ≤ now).
- Bounded per-job timeout (doesn’t inherit turn ctx — reflexion outlives the turn by design).
- Panic recovery: if the processor panics, goes straight to DLQ (retrying a bug loops).
- Emits
chatcli_lessonq_processing_duration_seconds{outcome}.
Dead Letter Queue
Permanent failures or retry exhaustion go to~/.chatcli/reflexion/dlq/ (same WAL format, read-only to the process). Operator inspects and decides:
Retry with Jitter
Transient errors (ctx timeout, provider 429/503, temp fs error) become reschedules:Idempotency
JobID is content-addressed: sha256(normalized(task) | trigger | attempt | outcome)[:16]. Re-triggering the same situation while the job is in-flight is a no-op (WAL exists → Runner skips queue insert). Whitespace is normalized to avoid inflation from trivial churn.
Drain + Graceful Shutdown
On exit (cli.cleanup()), the Runner enters DrainAndShutdown(30s):
- Queue closes — no new dequeues.
- Workers finish in-flight (or get cancelled on timeout).
- WAL/DLQ close.
/reflect — Commands
All subcommands have Tab autocomplete.
/reflect retry and /reflect purge list live DLQ IDs with task preview + last error.Files and layout
CHATCLI_QUALITY_REFLEXION_QUEUE_BASE_DIR (default: <workspace>/.chatcli/reflexion).
Lesson generator protocol
The system prompt instructs the model to be general, not one-off:/reflect — manual path without LLM
When you know the lesson and don’t need an LLM distilling:
memory.Fact:
["reflexion", "trigger:manual", "user-supplied"].
How the lesson “comes back”
Once persisted, the lesson is a regular fact in the index. It surfaces via:- Hint-based retrieval: if the next task mentions keywords in
Tags, the relevance-based scorer surfaces it. - HyDE amplifies: with
CHATCLI_QUALITY_HYDE_ENABLED=true, the generated hypothesis covers similar concepts, increasing match chance. - Vector search: with embeddings configured, the lesson is searched by cosine proximity.
## Long-term Memory section with the lesson text, and the model has all the cues to not repeat the mistake.
Environment variables
Gates (when to fire)
| Env var | Default | What it does |
|---|---|---|
CHATCLI_QUALITY_REFLEXION_ENABLED | true | Master switch |
CHATCLI_QUALITY_REFLEXION_ON_ERROR | true | Fire on tool error |
CHATCLI_QUALITY_REFLEXION_ON_HALLUCINATION | true | Fire on verified_with_discrepancy |
CHATCLI_QUALITY_REFLEXION_ON_LOW_QUALITY | false | Fire on refine_low_quality |
CHATCLI_QUALITY_REFLEXION_PERSIST | true | Write to memory.Fact (false = log-only) |
Durable queue (WAL + worker pool + DLQ)
| Env var | Default | Effect |
|---|---|---|
CHATCLI_QUALITY_REFLEXION_QUEUE_ENABLED | true | Queue master switch. false falls back to legacy (detached goroutine) |
CHATCLI_QUALITY_REFLEXION_QUEUE_WORKERS | 2 | Worker pool size. Reflexion is I/O-bound on the LLM call |
CHATCLI_QUALITY_REFLEXION_QUEUE_CAPACITY | 1000 | Max in-memory depth before overflow policy kicks in |
CHATCLI_QUALITY_REFLEXION_QUEUE_DROP_OLDEST | false | Overflow: true drop oldest; false block with timeout |
CHATCLI_QUALITY_REFLEXION_QUEUE_BLOCK_TIMEOUT | 5s | How long Enqueue waits when full (if DROP_OLDEST=false) |
CHATCLI_QUALITY_REFLEXION_QUEUE_MAX_ATTEMPTS | 5 | Total retries before moving to DLQ |
CHATCLI_QUALITY_REFLEXION_QUEUE_INITIAL_DELAY | 1s | First retry delay |
CHATCLI_QUALITY_REFLEXION_QUEUE_MAX_DELAY | 5m | Cap on exponential retry |
CHATCLI_QUALITY_REFLEXION_QUEUE_JITTER | 0.2 | Fractional jitter ([0, 0.5]) — AWS-style full jitter |
CHATCLI_QUALITY_REFLEXION_QUEUE_JOB_TIMEOUT | 2m | Per-processor-call timeout (LLM + persist) |
CHATCLI_QUALITY_REFLEXION_QUEUE_STALE_AFTER | 168h | WAL records older than this are discarded on replay (7 days) |
CHATCLI_QUALITY_REFLEXION_QUEUE_BASE_DIR | <workspace>/.chatcli/reflexion | Override of the root dir (WAL + DLQ) |
Prometheus metrics
The queue emits 10 metrics underchatcli_lessonq_*:
| Metric | Type | Labels | Meaning |
|---|---|---|---|
enqueue_total | Counter | outcome | accepted, rejected_full, deduped, dropped_oldest |
queue_depth | Gauge | — | In-memory pending jobs |
processing_duration_seconds | Histogram | outcome | dequeue→outcome time |
attempts_total | Counter | outcome | success, skipped, transient, permanent |
retry_total | Counter | attempt | retries bucketed by attempt number |
dlq_size | Gauge | — | Jobs in DLQ |
wal_segments | Gauge | — | Active .wal files |
wal_corruption_total | Counter | — | Records rejected for CRC mismatch/torn write |
stale_discarded_total | Counter | — | Records dropped at replay due to age |
persist_failures_total | Counter | — | memory.Fact callback failures |
Full cycle example
Next week, user asks for a similar refactor
/coder refactor pkg/auth/manager.go split into smaller filesInspect stored lessons
Useful Prometheus snapshots
Legacy inspection (pre-queue)
See also
#4 RAG + HyDE
How lessons are retrieved in future tasks via semantic retrieval.
#6 CoVe
The verifier generates the
verified_with_discrepancy signal that Reflexion consumes.Bootstrap Memory
The layer underneath: how memory.Fact is populated and maintained.
Memory Commands
/memory load, /memory show, /memory longterm.