HyDE is opt-in (
CHATCLI_QUALITY_HYDE_ENABLED=true) to keep the steady-state with no additional cost. Phase 3a costs +1 cheap LLM call; Phase 3b requires configuring an embedding provider.The problem HyDE solves
The pre-pipelinememory.Fact retrieval was keyword-only: the scorer matches tokens extracted from recent messages against tags and content of stored facts. Works well when vocabulary matches exactly — fails when the user uses synonyms or asks abstract questions.
Gap example:
- Without HyDE
- With HyDE 3a
- With HyDE 3b
User:
Extracted keywords:
Stored fact:
Match: ❌ — “do” and “go” don’t literally appear in the fact.
how to do X in Go?Extracted keywords:
[do, go]Stored fact:
"use goroutines for concurrency in X pipelines"Match: ❌ — “do” and “go” don’t literally appear in the fact.
Phase 3a — Hypothesis-based keyword expansion
LLM generates short hypothesis
Prompt: “Write a 2-4 sentence plausible answer that uses the technical nouns that would appear in any matching note. Bilingual if the query mixes languages.”
ExtractKeywords from the hypothesis
The same extractor already used in chat mode (en+pt stop words, min 3 chars).
Merge unique + lower-case
Original keywords + top-N from hypothesis, cap configurable via
CHATCLI_QUALITY_HYDE_NUM_KEYWORDS (default 5).Phase 3b — Vector embeddings + blended ranking
Adds cosine similarity search over fact embeddings — and, since the ranking refactor, fuses the cosine score with the lexical and temporal signals into a single ranking.What changed (PR #1027): previously the vector hits were dissolved back into keywords and the cosine score was discarded — you paid for an embedding call and threw away the very semantic signal. Now the cosine flows straight into the
SearchBlended ranker, so a fact matched only by paraphrase (zero lexical overlap) can still rank.Architecture
Blended ranking (SearchBlended)
Three independent, complementary signals, each min-max normalized across the candidate set before the weighted sum:
| Signal | Source | Captures |
|---|---|---|
| semantic | cosine from the vector store | synonymy, paraphrase |
| lexical | keyword/tag overlap | exact terms, identifiers, file names |
| temporal | recency decay × access frequency | what the user actually uses |
DefaultRankWeights): semantic 0.55 · lexical 0.30 · temporal 0.15 — semantic-first, because the embedding call was already paid for. Fusion is additive (not multiplicative) on purpose: a fact with high cosine and zero keyword overlap still ranks — a product would zero it out. Min-max normalization is what makes the weights provider-agnostic: Voyage 1024-d and OpenAI 1536-d cosine both land in [0,1] after normalizing.
Supported providers
- Voyage (recommended)
- OpenAI
- Bedrock (Titan + Cohere)
- Null (default)
voyage-3, 1024-dim) is the general-purpose sweet spot.Pure-Go vector store — generic vindex primitive
No CGO, no SQLite-vec, no external deps. Just
float32[] + cosine + JSON persistence in ~/.chatcli/memory/vector_index.json.llm/embedding/vindex, extracted once a second consumer (the semantic /context retrieval) appeared. memory is just a thin adapter over it — no vector machinery duplicated per package:
k, not corpus size — the scale ceiling is no longer “hundreds of facts”. For the typical chatcli case linear search completes in microseconds; no HNSW or IVFFlat.
Provider/dimension auto-migration
Switching provider or dimension (Voyage 1024 → OpenAI 1536, or voyage→cohere at the same 1024) no longer needs a manualrm. On load, the index detects the mismatch — of dimension (cosine between different arities is undefined) or of provider (two embedding spaces aren’t comparable, even at the same dimension) — and auto-clears the cache, removing the file so backfill repopulates:
Lazy backfill
When retrieving a fact, if it has no vector (fact predates embeddings activation), the index spawns a detached goroutine to embed the top-500 visible facts:Evaluation — proving retrieval (not assuming it)
There used to be no way to measure whether retrieval was any good. Now there is a dependency-free evaluation harness incli/workspace/memory/eval — standard macro-averaged IR metrics:
| Metric | What it measures |
|---|---|
| recall@k | fraction of relevant facts retrieved in the top-k |
| precision@k | fraction of the top-k that was relevant |
| MRR | mean reciprocal rank of the first hit |
| nDCG@k | normalized discounted cumulative gain |
ranking_test.go, with a deterministic embedding provider) compares keyword-only vs. blended ranking on queries phrased with synonyms absent from the fact text:
The harness runs the same way in CI (deterministic provider, reproducible) or against a real backend — it never imports a provider, so it stays neutral across all 14 supported ones. It is what turns “looks good” into “is good, measured”, and guards against regressions.
Ranking tunables (no new env vars)
The blended-ranking parameters live asConfig fields with strong defaults — no new env vars were introduced (and the pre-existing CHATCLI_MEMORY_* vars, previously ignored on the structured path, are now applied via ConfigFromEnv + clamping):
Config field | Default | Controls |
|---|---|---|
RankWeights | {0.55, 0.30, 0.15} | semantic / lexical / temporal weights |
MinCosineScore | 0.25 | cosine floor in the top-K |
VectorTopK | 12 | vector candidates per query |
BackfillBatchMax | 500 | cap of facts embedded per retrieve |
Full configuration
| Env var | Default | Effect |
|---|---|---|
CHATCLI_QUALITY_HYDE_ENABLED | false | Master switch (phase 3a) |
CHATCLI_QUALITY_HYDE_USE_VECTORS | false | Enable phase 3b (requires provider) |
CHATCLI_QUALITY_HYDE_NUM_KEYWORDS | 5 | Hypothesis keyword cap in phase 3a |
CHATCLI_EMBED_PROVIDER | — | voyage / openai / bedrock / null — single source of truth for provider selection |
CHATCLI_EMBED_MODEL | provider default | Voyage: voyage-3. OpenAI: text-embedding-3-small / -large. Bedrock: amazon.titan-embed-text-v2:0 (default), amazon.titan-embed-text-v1, cohere.embed-english-v3, cohere.embed-multilingual-v3. |
CHATCLI_EMBED_DIMENSIONS | model native | OpenAI: truncate via Matryoshka. Bedrock Titan v2: 256 / 512 / 1024 (rejects others). Bedrock Titan v1 / Cohere v3: fixed dimension, ignored. |
BEDROCK_REGION / AWS_REGION | us-east-1 | AWS region — only used when CHATCLI_EMBED_PROVIDER=bedrock. |
AWS_PROFILE | — | AWS profile — only used when CHATCLI_EMBED_PROVIDER=bedrock. |
/config quality surfaces state
Integration with Reflexion
HyDE amplifies Reflexion’s value: lessons persisted by #3 are retrieved with much higher recall when the next task doesn’t use the exact same keywords. Workflow:Turn 1: auth.go refactor fails (timeout)
Reflexion persists lesson:
"use Edit tool for large files", tags [go, refactor, edit-tool].Turn 5 (days later): 'help me split pkg/engine'
Query doesn’t contain
refactor or edit. Keyword-only would miss the lesson.HyDE 3a generates hypothesis
"To split a Go package, identify logical groupings and use refactor patterns with Edit tool for surgical changes..."Extracted keywords:
[split, package, refactor, edit, patterns, …]Caveats and tuning
Graceful fallback: if the LLM fails or the embedding provider returns an error, retrieval falls back to keyword-only silently. No turn is aborted by HyDE failure.
See also
#3 Reflexion
The lessons that HyDE retrieves with higher recall.
Bootstrap Memory
The layer underneath: how memory.Fact is populated and maintained.
Persistent Context
/context attach for explicit file contexts.Full configuration
All envs and slashes.