Knowledge Base (keyless RAG)

A 6MB documentation corpus becomes ~1.5M tokens if attached whole — it blows any model’s window on the first turn. The knowledge mode of /context solves it with the same pull-first pattern as persistent memory: the conversation receives only an index card (what the base covers), and content is retrieved on demand — automatically each turn and, in the agent, iteratively via the @knowledge tool.

# 1. Flatten the docs (e.g. the @docs-flatten plugin) → JSONL
@docs-flatten --repo https://github.com/org/docs.git --format jsonl --output docs.jsonl

# 2. Index as a knowledge base (native JSONL; directories work too)
/context create my-docs docs.jsonl --mode knowledge

# 3. Attach — fixed ~900 tokens/turn, at 6MB or 60MB
/context attach my-docs

	Traditional attach (`full`)	`--mode knowledge`
Prompt cost	whole corpus (~1.5M tokens for 6MB)	index card (~900 tokens, fixed)
Content	pushed all at once	pulled by relevance, every turn
API key	—	none (pure-Go BM25; embeddings = optional boost)
Knowledge	truncated/overflowing	intact, searchable and citable by `source`

How it works

Native JSONL ingestion (docs-flatten)

Each JSONL line becomes a virtual document preserving source, title and provenance (repoUrl/commit) — instead of arriving as one opaque text blob. Malformed lines are counted and skipped, never fatal. Plain directories also become knowledge bases (normal scanner, up to 100MB).

The index card (what enters the prompt)

Attaching injects only a deterministic, budget-bounded TOC — name, scale, origin and the document list — living in the cached prompt prefix (byte-stable across turns). The model knows what exists without paying for the content:

📚 KNOWLEDGE BASE: my-docs
Origin: https://github.com/org/docs.git @ abc123def456
Scale: 87 document(s), 412 passage(s), ~1.5M tokens of source material (NOT in context)
Table of contents:
- guide/install.md (4 passages) — Install
- guide/deploy.md (12 passages) — Deploying to production
…

Hybrid retrieval (keyless-first)

Each turn, the passages relevant to the question are injected into a volatile block (outside the cached prefix):

Pure-Go BM25 — always available, no API key, pt/English neutral. That’s the floor.
Embeddings (Voyage/OpenAI/Bedrock, when configured) — semantic boost, fused by normalized ranking (0.55/0.45). An embedding failure degrades to lexical with a warning; it never breaks the turn.

The `@knowledge` tool — the agent investigates the base

In agent and coder modes, the index cards enter the system prompt and the @knowledge tool enables iterative investigation — search, read whole documents in pages, walk the structure:

Subcommand	What it does
`search {query, top_k?, kb?}`	Ranked passages (hybrid) with `source` citations
`get {source, offset?, kb?}`	Reads a whole document, in ~3K-token pages with a continuation offset
`toc {prefix?, kb?}`	Lists the base’s documents (path-prefix filter)
`list`	Knowledge bases attached to the session and their scale

The use case that closes the loop — authoring skills from the docs with the @skill tool:

/agent create a deploy skill based on the production section of the docs
  → @knowledge search "deploy production"
  → @knowledge get "guide/deploy.md"
  → @skill create deploy-prod …

In chat too (read-only exception)

Chat stays tool-less by design — but querying the knowledge base is the second sanctioned exception (next to ask_user), for the same reason: it executes nothing, only reads what you attached. Attach the base and talk normally; when the auto-retrieved passages aren’t enough, the model pulls more on its own (up to 4 pulls per turn: search → get → next page) before answering.

/config chat knowledge off      # disable the exception (CHATCLI_CHAT_KNOWLEDGE=false)
/config chat knowledge on       # re-enable (default: on)

Works on the native tool path (API key) and the XML transport (OAuth providers) — like everything else, agnostic across the 14 providers.

Quick reference

Surface	Value
Create	`/context create <name> <corpus.jsonl\|dir> --mode knowledge`
Attach / detach	`/context attach <name>` / `/context detach <name>`
Cost per turn	index card (~900 tokens, cached) + volatile top-K
Chat toggle	`/config chat knowledge on\|off\|toggle` (`CHATCLI_CHAT_KNOWLEDGE`, default `on`)
Embeddings (optional)	`CHATCLI_EMBED_PROVIDER=voyage\|openai\|bedrock` — without a provider, BM25 covers everything
Limits	100MB / 5,000 documents per base; `get` paged at ~12K chars

Knowledge vs --rag: the existing /context attach --rag K is vector-only (requires an embedding provider) and only pushes per turn. knowledge mode works with no key at all, gives the model the corpus index and adds the pull side (@knowledge) — for documentation corpora, prefer --mode knowledge.

Knowledge Base (keyless RAG)

How it works

Native JSONL ingestion (docs-flatten)

The index card (what enters the prompt)

Hybrid retrieval (keyless-first)

The `@knowledge` tool — the agent investigates the base

In chat too (read-only exception)

Quick reference

Next steps

Persistent Contexts

RAG + HyDE

Skill Authoring

Bootstrap & Memory

​How it works

​Native JSONL ingestion (docs-flatten)

​The index card (what enters the prompt)

​Hybrid retrieval (keyless-first)

​The @knowledge tool — the agent investigates the base

​In chat too (read-only exception)

​Quick reference

​Next steps

Persistent Contexts

RAG + HyDE

Skill Authoring

Bootstrap & Memory

How it works

Native JSONL ingestion (docs-flatten)

The index card (what enters the prompt)

Hybrid retrieval (keyless-first)

The `@knowledge` tool — the agent investigates the base

In chat too (read-only exception)

Quick reference

Next steps