Complete reference for the seven patterns: every CHATCLI_QUALITY_* and CHATCLI_EMBED_* env var, the five slashes (/thinking, /plan, /refine, /verify, /reflect), and what /config quality surfaces.
All the quality-pipeline configuration in one place. Three channels: env vars (persistent), slashes (session), /config quality (inspection).
/thinking # show current state/thinking auto # clear override/thinking off # force no-thinking for next turn/thinking on # alias for /thinking high/thinking low|medium|high|max/thinking budget=N # nearest tier to N tokens
/plan # arm flag; next /agent or /coder uses Plan-First/plan <free task> # arm + enter agent mode and execute/plan agent <task> # explicit equivalent of the previous form/plan coder <task> # enter coder mode (software engineer) and execute/plan preview <task> # dry-run: generate and render the plan WITHOUT executing/plan dry <task> # alias of preview
Recommended flow for large changes:/plan preview <task> β review the plan β /plan coder <same task> to execute.
/refine # current state/refine on # enable for session/refine off # disable/refine once|next # enable (today identical to on)/refine auto|clear # clear override β use /config
/reflect <free text of the lesson># Ex: /reflect when editing large Go files use Edit, not full rewrite
/reflect <free text> writes directly to memory.Fact (category=lesson, trigger=manual) without an LLM call. The queue subcommands (list/failed/retry/purge/drain) show and operate on automatic triggers (error, hallucination, low quality) buffered in the WAL awaiting async processing.
/reflect retry and /reflect purge feature dynamic autocomplete: Tab pulls live IDs from the DLQ with task preview + last error.
Required for provider=bedrock β reuses the same chain as the AWS Bedrock feature (IAM role, SSO, assume-role, profile).
Bedrock embeddings support Titan (single text per call β parallelized internally with an 8-worker pool) and Cohere v3 (native batch). Family dispatch is automatic from the model id prefix. See RAG + HyDE for the full architecture.
Extra cost: +3-4 calls per worker with non-mechanical output. Use in critical PR reviews.
export CHATCLI_QUALITY_REFINE_ENABLED=trueexport CHATCLI_QUALITY_VERIFY_ENABLED=trueexport CHATCLI_QUALITY_HYDE_ENABLED=true# Optional: HyDE vectors for max recallexport CHATCLI_QUALITY_HYDE_USE_VECTORS=true# Pick an embedding provider (any of these works):# β Voyage (Anthropic-recommended)export CHATCLI_EMBED_PROVIDER=voyageexport VOYAGE_API_KEY=pa-...# β OpenAI (if OPENAI_API_KEY is already in the environment)# export CHATCLI_EMBED_PROVIDER=openai# β Bedrock (same AWS chain as chat, no extra API key)# export CHATCLI_EMBED_PROVIDER=bedrock# export CHATCLI_EMBED_MODEL=amazon.titan-embed-text-v2:0# export BEDROCK_REGION=us-east-1
Extra cost: +1 HyDE hypothesis + refine + verify per doc generation. Polished and factually-checked output.
If something breaks in production and you need to return to pre-pipeline behavior instantly:
export CHATCLI_QUALITY_ENABLED=false
This makes Pipeline.Run degenerate to return agent.Execute(...) β byte-identical to pre-PR chatcli. Zero hooks, zero reasoning auto-attach, zero HyDE.
The master switch is the emergency exit. Individual toggles are for tuning scenarios; the master is for rollback.