Knowledge Base (keyless RAG)

A 6MB documentation corpus becomes ~1.5M tokens if attached whole — it blows any model’s window on the first turn. The knowledge mode of /context solves it with the same pull-first pattern as persistent memory: the conversation receives only an index card (what the base covers), and content is retrieved on demand — automatically each turn and, in the agent, iteratively via the @knowledge tool.

# 1. Flatten the docs (e.g. the @docs-flatten plugin) → JSONL
@docs-flatten --repo https://github.com/org/docs.git --format jsonl --output docs.jsonl

# 2. Index as a knowledge base (native JSONL; directories work too)
/context create my-docs docs.jsonl --mode knowledge

# 3. Attach — fixed ~900 tokens/turn, at 6MB or 60MB
/context attach my-docs

	Traditional attach (`full`)	`--mode knowledge`
Prompt cost	whole corpus (~1.5M tokens for 6MB)	index card (~900 tokens, fixed)
Content	pushed all at once	pulled by relevance, every turn
API key	—	none (pure-Go BM25; embeddings = optional boost)
Knowledge	truncated/overflowing	intact, searchable and citable by `source`

How it works

Native JSONL ingestion (docs-flatten)

Each JSONL line becomes a virtual document preserving source, title and provenance (repoUrl/commit) — instead of arriving as one opaque text blob. Malformed lines are counted and skipped, never fatal. Plain directories also become knowledge bases (normal scanner, up to 100MB). @docs-flatten accepts three sources for the same JSONL: root=<dir> (local folder), repo=<git-url> (shallow clone) and url=<site> (a bounded same-host crawl for docs that only exist as an HTML site, with no Markdown repo).

Code and infrastructure (`kind=code`)

Beyond documentation, @docs-flatten ingests source-code, Terraform and GitOps (Kubernetes/Argo) repositories — into the same JSONL schema, so knowledge mode changes nothing downstream. The kind parameter controls what enters and how it is sliced:

`kind`	What it ingests
`docs` (default)	Markdown/MDX only. Legacy behavior — existing docs workflows never start absorbing code by accident.
`code`	Also source code, Terraform, Kubernetes/Argo YAML and shell, sliced by structure (functions, resources, manifests) with symbol/resource titles. The “point it at an app/infra/GitOps repo” mode.
`auto`	Per file: Markdown stays Markdown, code/config slice by structure, other text windows.

The slicing is language-agnostic — it doesn’t rely on a per-language keyword list, so it doesn’t break when you switch stacks:

Files	Chunk unit	Title
`.go .java .kt .ts .rs .cs .cpp .swift .scala .php` …	top-level declaration (brace balance)	symbol (`HandleCheckout`, `OrderService`)
`.tf .tfvars .hcl`	Terraform block	`type.name` (`aws_eks_cluster.main`)
`.yaml .yml`	document (`---`)	`Kind/name` (`Rollout/checkout-api`)
`.py .rb`	top-level `def`/`class` (indentation)	symbol
`.sh .bash`	function / top-level block	function name
any other text	size window	file path

The title is best-effort metadata: if the heuristic doesn’t recognize the language, it falls back to the cleaned signature line — content is always indexed and searchable, a missed title never costs recall. Noise is skipped by default (vendor/, node_modules/, .terraform/, lockfiles, minified assets, binaries) and files above 1 MiB are ignored.

# One base per layer — app, infra and gitops
@docs-flatten root=./app   kind=code format=jsonl output=app.jsonl
@docs-flatten root=./infra kind=code format=jsonl output=infra.jsonl   # .tf blocks
@docs-flatten root=./argo  kind=code format=jsonl output=argo.jsonl    # .yaml manifests

/context create app   app.jsonl   --mode knowledge
/context create infra infra.jsonl --mode knowledge
/context create argo  argo.jsonl  --mode knowledge
/context attach app && /context attach infra && /context attach argo

With the three bases attached, @knowledge search fans out across all of them (each hit tagged by its source base), so the model connects the layers: “the checkout-api Rollout won’t go ready — connect the Argo manifest, the Terraform node group and the service health check in code”.

You do not need to classify the repo manually. The default is docs for safety, but the agent picks kind=code on its own: from intent (the tool schema documents the use), from the autonomous pipeline guidance (below), and from a self-correcting hint — running the default docs on a repo with no Markdown returns “looks like a code repo, re-run with kind=code”, and it recovers in the same turn.

The index card (what enters the prompt)

Attaching injects only a deterministic, budget-bounded TOC — name, scale, origin and the document list — living in the cached prompt prefix (byte-stable across turns). The model knows what exists without paying for the content:

📚 KNOWLEDGE BASE: my-docs
Origin: https://github.com/org/docs.git @ abc123def456
Scale: 87 document(s), 412 passage(s), ~1.5M tokens of source material (NOT in context)
Table of contents:
- guide/install.md (4 passages) — Install
- guide/deploy.md (12 passages) — Deploying to production
…

Hybrid retrieval (keyless-first)

Each turn, the passages relevant to the question are injected into a volatile block (outside the cached prefix):

Pure-Go BM25 — always available, no API key, pt/English neutral. That’s the floor. The tokenizer splits snake_case, kebab-case and camelCase/PascalCase into sub-words (keeping the whole token), so an identifier like getUserName or aws_eks_cluster is found by user, eks, etc. — code recall without losing exact match.
Embeddings (Voyage/OpenAI/Bedrock, when configured) — semantic boost, fused by normalized ranking (0.55/0.45). An embedding failure degrades to lexical with a warning; it never breaks the turn.

The `@knowledge` tool — the agent investigates the base

In agent and coder modes, the index cards enter the system prompt and the @knowledge tool enables iterative investigation — search, read whole documents in pages, walk the structure:

Subcommand	What it does
`search {query, top_k?, kb?}`	Ranked passages (hybrid) with `source` citations
`get {source, offset?, kb?}`	Reads a whole document, in ~3K-token pages with a continuation offset
`toc {prefix?, kb?}`	Lists the base’s documents (path-prefix filter)
`list`	Knowledge bases attached to the session and their scale

The use case that closes the loop — authoring skills from the docs with the @skill tool:

/agent create a deploy skill based on the production section of the docs
  → @knowledge search "deploy production"
  → @knowledge get "guide/deploy.md"
  → @skill create deploy-prod …

Autonomous pipeline — the agent builds the base itself (`@context`)

The steps above (flatten → create → attach) the agent does for you. When it hits a knowledge gap — a library, framework or API it doesn’t know — instead of guessing or stopping to ask, it builds the base itself:

Discover the source

@websearch for the official documentation (preferably the project’s Markdown repo), or use a repo/URL/path you pointed it at.

Flatten

@docs-flatten with root=<dir>, repo=<git> or url=<site> → produces the JSONL corpus. For a code/infra repo, it adds kind=code (one base per layer: app, infra, gitops).

Create and attach

@context create … --mode knowledge → @context attach ….

Query

@knowledge search/get to ground the answer in the retrieved passages.

The @context tool gives the agent the same self-service power it already has for skills, but for knowledge:

Subcommand	What it does
`create {name, paths[], mode?}`	Build a base from a corpus.jsonl, directory or files (mode `knowledge` by default)
`update {name, paths?, mode?, …}`	Re-ingest/modify an existing base
`attach {name, rag?, priority?}`	Attach to the session; `rag` turns on semantic top-K retrieval
`detach {name}`	Remove the attachment from the session (the base stays on disk)
`list` / `status`	List all bases / show what is attached to this session
`show {name}` / `inspect {name, chunk?}`	A base’s metadata / a deeper view (files, chunks)
`merge {name, sources[]}`	Combine bases into a new one (deduplicated)
`export {name, path}` / `import {path}`	Save/load a base to a portable file
`metrics`	Store summary: total bases, attached, size, by mode
`delete {name}`	Remove the base from disk

The tool mirrors the full /context surface, so the agent handles contexts end to end. The inspecting subcommands (list, status, show, inspect, metrics) are read-only. You stay in control: everything the agent attaches shows up in /context attached and @context status; remove it with /context detach or just ask (“detach the react docs”). attach auto-detects embeddings — knowledge mode uses keyless BM25 + vectors when configured, and reports which mode is active. In /agent the agent does all of this on its own; in /coder, state-changing operations go through the policy confirmation.

The @docs-flatten url mode is what closes the loop for docs that only exist as an HTML site (no Markdown repo): a bounded same-host crawl reusing the @webfetch fetch engine and emitting the same JSONL. Bounded by maxPages/maxDepth — no silent truncation.

In chat too (read-only exception)

Chat stays tool-less by design — but querying the knowledge base is the second sanctioned exception (next to ask_user), for the same reason: it executes nothing, only reads what you attached. Attach the base and talk normally; when the auto-retrieved passages aren’t enough, the model pulls more on its own (up to 4 pulls per turn: search → get → next page) before answering.

/config chat knowledge off      # disable the exception (CHATCLI_CHAT_KNOWLEDGE=false)
/config chat knowledge on       # re-enable (default: on)

Works on the native tool path (API key) and the XML transport (OAuth providers) — like everything else, agnostic across the 14 providers.

Quick reference

Surface	Value
Create	`/context create <name> <corpus.jsonl\|dir> --mode knowledge`
Attach / detach	`/context attach <name>` / `/context detach <name>`
Agent does it itself	the `@context` tool (`create`/`attach`/`detach`/`list`/`status`/`delete`)
`@docs-flatten` sources	`root=<dir>` · `repo=<git>` · `url=<site>` (crawl)
`@docs-flatten` kind	`kind=docs` (default, Markdown only) · `kind=code` (code/Terraform/YAML/shell, structure-aware) · `kind=auto`
Cost per turn	index card (~900 tokens, cached) + volatile top-K
Chat toggle	`/config chat knowledge on\|off\|toggle` (`CHATCLI_CHAT_KNOWLEDGE`, default `on`)
Embeddings (optional)	`CHATCLI_EMBED_PROVIDER=voyage\|openai\|bedrock` — without a provider, BM25 covers everything
Limits	100MB / 5,000 documents per base; `get` paged at ~12K chars

Knowledge vs --rag: the existing /context attach --rag K is vector-only (requires an embedding provider) and only pushes per turn. knowledge mode works with no key at all, gives the model the corpus index and adds the pull side (@knowledge) — for documentation or code/infra corpora, prefer --mode knowledge.

Home

Getting Started

Core Concepts

Features

Security

Support

Knowledge Base (keyless RAG)

How it works

Native JSONL ingestion (docs-flatten)

Code and infrastructure (`kind=code`)

The index card (what enters the prompt)

Hybrid retrieval (keyless-first)

The `@knowledge` tool — the agent investigates the base

Autonomous pipeline — the agent builds the base itself (`@context`)

In chat too (read-only exception)

Quick reference

Next steps

Persistent Contexts

RAG + HyDE

Skill Authoring

Bootstrap & Memory

​How it works

​Native JSONL ingestion (docs-flatten)

​Code and infrastructure (kind=code)

​The index card (what enters the prompt)

​Hybrid retrieval (keyless-first)

​The @knowledge tool — the agent investigates the base

​Autonomous pipeline — the agent builds the base itself (@context)

​In chat too (read-only exception)

​Quick reference

​Next steps

Persistent Contexts

RAG + HyDE

Skill Authoring

Bootstrap & Memory

How it works

Native JSONL ingestion (docs-flatten)

Code and infrastructure (`kind=code`)

The index card (what enters the prompt)

Hybrid retrieval (keyless-first)

The `@knowledge` tool — the agent investigates the base

Autonomous pipeline — the agent builds the base itself (`@context`)

In chat too (read-only exception)

Quick reference

Next steps