A 6MB documentation corpus becomes ~1.5M tokens if attached whole β it blows any modelβs window on the first turn. The knowledge mode of /context solves it with the same pull-first pattern as persistent memory: the conversation receives only an index card (what the base covers), and content is retrieved on demand β automatically each turn and, in the agent, iteratively via the @knowledge tool.
# 1. Flatten the docs (e.g. the @docs-flatten plugin) β JSONL
@docs-flatten --repo https://github.com/org/docs.git --format jsonl --output docs.jsonl
# 2. Index as a knowledge base (native JSONL; directories work too)
/context create my-docs docs.jsonl --mode knowledge
# 3. Attach β fixed ~900 tokens/turn, at 6MB or 60MB
/context attach my-docs
| Traditional attach (full) | --mode knowledge |
|---|
| Prompt cost | whole corpus (~1.5M tokens for 6MB) | index card (~900 tokens, fixed) |
| Content | pushed all at once | pulled by relevance, every turn |
| API key | β | none (pure-Go BM25; embeddings = optional boost) |
| Knowledge | truncated/overflowing | intact, searchable and citable by source |
How it works
Native JSONL ingestion (docs-flatten)
Each JSONL line becomes a virtual document preserving source, title and provenance (repoUrl/commit) β instead of arriving as one opaque text blob. Malformed lines are counted and skipped, never fatal. Plain directories also become knowledge bases (normal scanner, up to 100MB).
The index card (what enters the prompt)
Attaching injects only a deterministic, budget-bounded TOC β name, scale, origin and the document list β living in the cached prompt prefix (byte-stable across turns). The model knows what exists without paying for the content:
π KNOWLEDGE BASE: my-docs
Origin: https://github.com/org/docs.git @ abc123def456
Scale: 87 document(s), 412 passage(s), ~1.5M tokens of source material (NOT in context)
Table of contents:
- guide/install.md (4 passages) β Install
- guide/deploy.md (12 passages) β Deploying to production
β¦
Hybrid retrieval (keyless-first)
Each turn, the passages relevant to the question are injected into a volatile block (outside the cached prefix):
- Pure-Go BM25 β always available, no API key, pt/English neutral. Thatβs the floor.
- Embeddings (Voyage/OpenAI/Bedrock, when configured) β semantic boost, fused by normalized ranking (0.55/0.45). An embedding failure degrades to lexical with a warning; it never breaks the turn.
In agent and coder modes, the index cards enter the system prompt and the @knowledge tool enables iterative investigation β search, read whole documents in pages, walk the structure:
| Subcommand | What it does |
|---|
search {query, top_k?, kb?} | Ranked passages (hybrid) with source citations |
get {source, offset?, kb?} | Reads a whole document, in ~3K-token pages with a continuation offset |
toc {prefix?, kb?} | Lists the baseβs documents (path-prefix filter) |
list | Knowledge bases attached to the session and their scale |
The use case that closes the loop β authoring skills from the docs with the @skill tool:
/agent create a deploy skill based on the production section of the docs
β @knowledge search "deploy production"
β @knowledge get "guide/deploy.md"
β @skill create deploy-prod β¦
In chat too (read-only exception)
Chat stays tool-less by design β but querying the knowledge base is the second sanctioned exception (next to ask_user), for the same reason: it executes nothing, only reads what you attached. Attach the base and talk normally; when the auto-retrieved passages arenβt enough, the model pulls more on its own (up to 4 pulls per turn: search β get β next page) before answering.
/config chat knowledge off # disable the exception (CHATCLI_CHAT_KNOWLEDGE=false)
/config chat knowledge on # re-enable (default: on)
Works on the native tool path (API key) and the XML transport (OAuth providers) β like everything else, agnostic across the 14 providers.
Quick reference
| Surface | Value |
|---|
| Create | /context create <name> <corpus.jsonl|dir> --mode knowledge |
| Attach / detach | /context attach <name> / /context detach <name> |
| Cost per turn | index card (~900 tokens, cached) + volatile top-K |
| Chat toggle | /config chat knowledge on|off|toggle (CHATCLI_CHAT_KNOWLEDGE, default on) |
| Embeddings (optional) | CHATCLI_EMBED_PROVIDER=voyage|openai|bedrock β without a provider, BM25 covers everything |
| Limits | 100MB / 5,000 documents per base; get paged at ~12K chars |
Knowledge vs --rag: the existing /context attach --rag K is vector-only (requires an embedding provider) and only pushes per turn. knowledge mode works with no key at all, gives the model the corpus index and adds the pull side (@knowledge) β for documentation corpora, prefer --mode knowledge.
Next steps