The layer is reversible by design: lossy reduction only happens when the original is first written to a local store (CCR) and a marker is embedded in the prompt. The model recovers the original verbatim with
@recall. Below a size threshold the output is byte-identical to before.Why it matters
A single@search over a large codebase, a verbose go test ./..., or a JSON API response can inject tens of thousands of mostly-redundant tokens into the context. Without compression that volume:
- Saturates the context window and forces aggressive compaction (which loses information).
- Multiplies cost on every ReAct turn.
- Busts the providerβs prefix cache when volatile content enters raw.
How it works
A ContentRouter detects the content type (or trusts a hint from the originating tool) and routes to the right compressor. Each compressor is deterministic and type-specific:| Content type | Strategy | Typical reduction | Auto-fires? |
|---|---|---|---|
| Search results (grep/ripgrep) | search β group by file, keep first/last + error lines, caps | ~78β92% | Yes |
| Build/test/runtime logs | log β keep errors, stack traces, deduped warnings, summary; drop noise | ~92β99% | Yes |
| Unified diffs (git) | diff β keep +/-, trim context, hunk/file caps | ~87% | Yes |
| Large JSON arrays | json-crush β lossless whitespace compaction + item sample with sentinel | ~96% | Yes |
| Source code | code-ast β skeleton (signatures, no bodies) via go/ast + heuristic | ~41% | No (explicit only) |
| Prose / Markdown (web/docs) | prose β dedup boilerplate, collapse blank runs, trim sections | ~44% | Web/explicit only |
CCR β Contextual Compression Retrieval
When a compressor drops part of a payload, the full original is written to a local store and a<<ccr:HASH>> marker is embedded in the output. The store is:
- Content-addressed (key = SHA-256 hash of the content) β natural dedup: identical content is stored once.
- Bounded (size cap via LRU + TTL) and crash-safe (no corruptible index β the directory is the index).
- Boundary-validated: keys are validated as fixed-width hex before becoming a filesystem path (no path traversal).
@recall and gets the byte-identical original.
@compress and @recall tools
Compresses a payload on demand. Accepts
{"content":"...","hint":"auto|log|search|diff|json|code|prose"} and returns the reduced form with the original preserved in CCR. The {"cmd":"stats"} subcommand reports session savings.Recovers the full original from a
<<ccr:KEY>> marker. Accepts the bare key or the full marker. Use it when the compressed view omitted something you need.Automatic compression across all modes
| Mode | Where compression engages |
|---|---|
| agent / coder | Each toolβs output is compressed at the loop chokepoint, before it enters history (covers both the structured-native and legacy paths). Errors go verbatim so the model can debug them. |
| chat | During history compaction: bulky tool feedback and attached context are reduced reversibly (CCR) instead of truncated β the main chat win, since chat is tool-less. |
| sub-agents / workers | Delegated agents compress their output through the same shared store. Because CCR is content-addressed, identical content read by sibling agents is stored once (cross-agent dedup) and recoverable from any of them. |
Output-token reduction
Complementary to input compression, ChatCLI reduces the tokens the model generates:- Verbosity steering β a static (cache-friendly) directive injected into the cached system-prompt prefix tells the model to drop preamble, restatement and ceremony and lead with the answer/action. Levels:
full(off),concise(default),minimal. - Effort routing (opt-in) β a keyless complexity classifier lowers reasoning effort on trivial prompts. It only lowers effort for clearly-trivial prompts and only when no effort was already chosen β it never overrides a skill/user choice and never raises effort, so it cannot degrade a hard task.
/config output (see below).
Image compression (vision)
Images are shrunk before reaching vision-capable models β see the Vision Input page. In short: downscale the longest edge to 1568px (what providers already do server-side, so it is token-equivalent) + re-encode photos as JPEG, preserving transparency (PNG), never inflating the payload. Keyless, pure-Go.Configuration
/config compression
/config output
Environment variables
| Variable | Description | Default |
|---|---|---|
CHATCLI_COMPRESSION | Mode: off, lossless, lossy-with-ccr | lossy-with-ccr |
CHATCLI_COMPRESSION_THRESHOLD | Bytes below which output is byte-identical (passthrough) | 4000 |
CHATCLI_COMPRESSION_CCR_DIR | CCR store directory | ~/.chatcli/ccr |
CHATCLI_COMPRESSION_CCR_MAX_MB | Store size cap in MiB (0 = no cap) | 256 |
CHATCLI_COMPRESSION_CCR_TTL | CCR entry TTL (Go duration; 0 = no TTL) | 168h |
CHATCLI_OUTPUT_VERBOSITY | Verbosity steering: full, concise, minimal | concise |
CHATCLI_OUTPUT_EFFORT_ROUTING | Complexity-based effort routing (opt-in) | off |
π 12K saved).
Guarantees (never degrade)
- A result that is irreversible or does not shrink β verbatim passthrough (the router rejects it).
- No CCR available (lossless mode / no store) β lossy compressors drop nothing.
- Idempotent: already-compressed content (carrying a marker) is not re-compressed.
- Tool errors go verbatim so the model can debug them.