Skip to main content
A 6MB documentation corpus becomes ~1.5M tokens if attached whole β€” it blows any model’s window on the first turn. The knowledge mode of /context solves it with the same pull-first pattern as persistent memory: the conversation receives only an index card (what the base covers), and content is retrieved on demand β€” automatically each turn and, in the agent, iteratively via the @knowledge tool.
# 1. Flatten the docs (e.g. the @docs-flatten plugin) β†’ JSONL
@docs-flatten --repo https://github.com/org/docs.git --format jsonl --output docs.jsonl

# 2. Index as a knowledge base (native JSONL; directories work too)
/context create my-docs docs.jsonl --mode knowledge

# 3. Attach β€” fixed ~900 tokens/turn, at 6MB or 60MB
/context attach my-docs
Traditional attach (full)--mode knowledge
Prompt costwhole corpus (~1.5M tokens for 6MB)index card (~900 tokens, fixed)
Contentpushed all at oncepulled by relevance, every turn
API keyβ€”none (pure-Go BM25; embeddings = optional boost)
Knowledgetruncated/overflowingintact, searchable and citable by source

How it works

Native JSONL ingestion (docs-flatten)

Each JSONL line becomes a virtual document preserving source, title and provenance (repoUrl/commit) β€” instead of arriving as one opaque text blob. Malformed lines are counted and skipped, never fatal. Plain directories also become knowledge bases (normal scanner, up to 100MB).

The index card (what enters the prompt)

Attaching injects only a deterministic, budget-bounded TOC β€” name, scale, origin and the document list β€” living in the cached prompt prefix (byte-stable across turns). The model knows what exists without paying for the content:
πŸ“š KNOWLEDGE BASE: my-docs
Origin: https://github.com/org/docs.git @ abc123def456
Scale: 87 document(s), 412 passage(s), ~1.5M tokens of source material (NOT in context)
Table of contents:
- guide/install.md (4 passages) β€” Install
- guide/deploy.md (12 passages) β€” Deploying to production
…

Hybrid retrieval (keyless-first)

Each turn, the passages relevant to the question are injected into a volatile block (outside the cached prefix):
  • Pure-Go BM25 β€” always available, no API key, pt/English neutral. That’s the floor.
  • Embeddings (Voyage/OpenAI/Bedrock, when configured) β€” semantic boost, fused by normalized ranking (0.55/0.45). An embedding failure degrades to lexical with a warning; it never breaks the turn.

The @knowledge tool β€” the agent investigates the base

In agent and coder modes, the index cards enter the system prompt and the @knowledge tool enables iterative investigation β€” search, read whole documents in pages, walk the structure:
SubcommandWhat it does
search {query, top_k?, kb?}Ranked passages (hybrid) with source citations
get {source, offset?, kb?}Reads a whole document, in ~3K-token pages with a continuation offset
toc {prefix?, kb?}Lists the base’s documents (path-prefix filter)
listKnowledge bases attached to the session and their scale
The use case that closes the loop β€” authoring skills from the docs with the @skill tool:
/agent create a deploy skill based on the production section of the docs
  β†’ @knowledge search "deploy production"
  β†’ @knowledge get "guide/deploy.md"
  β†’ @skill create deploy-prod …

In chat too (read-only exception)

Chat stays tool-less by design β€” but querying the knowledge base is the second sanctioned exception (next to ask_user), for the same reason: it executes nothing, only reads what you attached. Attach the base and talk normally; when the auto-retrieved passages aren’t enough, the model pulls more on its own (up to 4 pulls per turn: search β†’ get β†’ next page) before answering.
/config chat knowledge off      # disable the exception (CHATCLI_CHAT_KNOWLEDGE=false)
/config chat knowledge on       # re-enable (default: on)
Works on the native tool path (API key) and the XML transport (OAuth providers) β€” like everything else, agnostic across the 14 providers.

Quick reference

SurfaceValue
Create/context create <name> <corpus.jsonl|dir> --mode knowledge
Attach / detach/context attach <name> / /context detach <name>
Cost per turnindex card (~900 tokens, cached) + volatile top-K
Chat toggle/config chat knowledge on|off|toggle (CHATCLI_CHAT_KNOWLEDGE, default on)
Embeddings (optional)CHATCLI_EMBED_PROVIDER=voyage|openai|bedrock β€” without a provider, BM25 covers everything
Limits100MB / 5,000 documents per base; get paged at ~12K chars
Knowledge vs --rag: the existing /context attach --rag K is vector-only (requires an embedding provider) and only pushes per turn. knowledge mode works with no key at all, gives the model the corpus index and adds the pull side (@knowledge) β€” for documentation corpora, prefer --mode knowledge.

Next steps

Persistent Contexts

RAG + HyDE

Skill Authoring

Bootstrap & Memory