Skip to main content
HyDE (Hypothetical Document Embeddings) is the classic technique of generating a hypothetical answer to the user’s question and using that answer as an additional retrieval signal. In ChatCLI, HyDE operates in two complementary phases: 3a expands keywords via LLM hypothesis, 3b adds vector cosine search.
HyDE is opt-in (CHATCLI_QUALITY_HYDE_ENABLED=true) to keep the steady-state with no additional cost. Phase 3a costs +1 cheap LLM call; Phase 3b requires configuring an embedding provider.

The problem HyDE solves

The pre-pipeline memory.Fact retrieval was keyword-only: the scorer matches tokens extracted from recent messages against tags and content of stored facts. Works well when vocabulary matches exactly — fails when the user uses synonyms or asks abstract questions. Gap example:
User: how to do X in Go?
Extracted keywords: [do, go]
Stored fact: "use goroutines for concurrency in X pipelines"
Match: ❌ — “do” and “go” don’t literally appear in the fact.

Phase 3a — Hypothesis-based keyword expansion

1

User types query

The query enters cli_llm.go or agent_mode.go.
2

HyDEAugmenter.Augment

augmenter := memory.NewHyDEAugmenter(cfg, llmCallback, logger)
expanded := augmenter.Augment(ctx, query, originalHints)
3

LLM generates short hypothesis

Prompt: “Write a 2-4 sentence plausible answer that uses the technical nouns that would appear in any matching note. Bilingual if the query mixes languages.”
4

ExtractKeywords from the hypothesis

The same extractor already used in chat mode (en+pt stop words, min 3 chars).
5

Merge unique + lower-case

Original keywords + top-N from hypothesis, cap configurable via CHATCLI_QUALITY_HYDE_NUM_KEYWORDS (default 5).
6

FactIndex.Search uses the expanded set

Existing keyword-based scorer operates over richer hints → much higher recall.
Phase 3a works without configuring an embedding provider. It’s the recommended default if the cost of +1 light LLM call is acceptable.

Phase 3b — Vector embeddings

Adds cosine similarity search over fact embeddings.

Architecture

┌──────────────────┐
│ User query       │
└────────┬─────────┘


┌─────────────────────────┐
│ EmbeddingProvider.Embed │  (Voyage AI / OpenAI / Null)
└────────┬────────────────┘

         ▼  vector float32[1024] or [1536]
┌─────────────────────────┐
│ VectorIndex.SimilarFacts│  (pure-Go cosine)
└────────┬────────────────┘

         ▼  top-K fact IDs
┌─────────────────────────┐
│ FactIndex.GetByID       │
│ ExtractKeywords(content)│
└────────┬────────────────┘


    Hints expanded via Phase 3a AND 3b

Supported providers

Pure-Go vector store

No CGO, no SQLite-vec, no external deps. Just float32[] + cosine + JSON persistence in ~/.chatcli/memory/vector_index.json.
// cli/workspace/memory/vector_store.go
type VectorEntry struct {
    FactID    string    `json:"fact_id"`
    Vector    []float32 `json:"vector"`
    Dimension int       `json:"dim"`
    Provider  string    `json:"provider"`
}
For N < 1000 facts (typical chatcli case), linear in-memory search completes in microseconds. No need for HNSW or IVFFlat indexing.

Dimension lock

Switching provider (Voyage 1024 → OpenAI 1536) is not automatic: the store rejects with an explanatory error. Reason: cosine between vectors of different dimensions is mathematically invalid.
# To migrate, clear the file
rm ~/.chatcli/memory/vector_index.json
# Change the provider and restart — lazy backfill repopulates

Lazy backfill

When retrieving a fact, if it has no vector (fact predates embeddings activation), the index spawns a detached goroutine to embed the top-25 visible facts:
// cli/workspace/memory/store.go:120
go func(items map[string]string) { //#nosec G118 -- detached on purpose
    if err := m.vectors.BackfillFacts(context.Background(), items); err != nil {
        m.logger.Warn("vector backfill failed", zap.Error(err))
    }
}(items)
Backfill is bounded: at most 25 facts per retrieve invocation. In a normal session, most of the index is embedded within the first dozens of interactions.

Full configuration

Env varDefaultEffect
CHATCLI_QUALITY_HYDE_ENABLEDfalseMaster switch (phase 3a)
CHATCLI_QUALITY_HYDE_USE_VECTORSfalseEnable phase 3b (requires provider)
CHATCLI_QUALITY_HYDE_PROVIDERDisplay-only provider name
CHATCLI_QUALITY_HYDE_NUM_KEYWORDS5Hypothesis keyword cap in phase 3a
CHATCLI_EMBED_PROVIDERvoyage|openai|null
CHATCLI_EMBED_MODELprovider defaultE.g. voyage-3, text-embedding-3-small
CHATCLI_EMBED_DIMENSIONSprovider defaultOpenAI only

/config quality surfaces state

── RAG + HyDE (#4)
  CHATCLI_QUALITY_HYDE_ENABLED    : enabled
  CHATCLI_QUALITY_HYDE_USE_VECTORS: enabled
  CHATCLI_QUALITY_HYDE_PROVIDER   : voyage
  CHATCLI_EMBED_PROVIDER          : voyage
  CHATCLI_EMBED_MODEL             : voyage-3
  CHATCLI_QUALITY_HYDE_NUM_KEYWORDS: 5
  Vector provider                : voyage:voyage-3
  Vector entries                 : 127

Integration with Reflexion

HyDE amplifies Reflexion’s value: lessons persisted by #3 are retrieved with much higher recall when the next task doesn’t use the exact same keywords. Workflow:
1

Turn 1: auth.go refactor fails (timeout)

Reflexion persists lesson: "use Edit tool for large files", tags [go, refactor, edit-tool].
2

Turn 5 (days later): 'help me split pkg/engine'

Query doesn’t contain refactor or edit. Keyword-only would miss the lesson.
3

HyDE 3a generates hypothesis

"To split a Go package, identify logical groupings and use refactor patterns with Edit tool for surgical changes..."
Extracted keywords: [split, package, refactor, edit, patterns, …]
4

Match!

Lesson appears in system prompt. Coder picks Edit over write from the start.

Caveats and tuning

Token cost of phase 3a: ~200 tokens per retrieval turn. In workflows with many read turns, the cost compounds. Use CHATCLI_QUALITY_HYDE_NUM_KEYWORDS=3 for tighter budget.
Privacy: the user query is sent to the embedding provider. For sensitive workloads, consider self-hosting an embedding model (roadmap: Ollama-embedding provider).
Graceful fallback: if the LLM fails or the embedding provider returns an error, retrieval falls back to keyword-only silently. No turn is aborted by HyDE failure.

See also

#3 Reflexion

The lessons that HyDE retrieves with higher recall.

Bootstrap Memory

The layer underneath: how memory.Fact is populated and maintained.

Persistent Context

/context attach for explicit file contexts.

Full configuration

All envs and slashes.