Skip to main content
ChatCLI offers two complementary systems for customizing and contextualizing the agent: bootstrap files to define personality and rules, and persistent memory to maintain context across sessions.
The bootstrap and memory system is fully connected to the system prompt flow. Files are automatically loaded and injected into all interactions — in chat mode as well as in /agent and /coder modes.

Bootstrap Files

Bootstrap files are Markdown documents automatically loaded into the agent’s system prompt. They define who the assistant is, how it behaves, and which rules it should follow.

Supported Files

ChatCLI loads exactly 5 bootstrap files, in this order. All are optional — if they don’t exist, they are simply skipped:
FilePurposeWhen to use
AGENTS.mdSub-agent definitions and their rolesWhen you want to instruct the orchestrator about available agents and how to use them
SOUL.mdAssistant personality, tone and styleTo define “who” the assistant is — how it speaks, thinks and behaves
USER.mdUser preferences and project contextTo inform the stack, conventions, preferred tools and project context
IDENTITY.mdAgent identity and capabilitiesTo define “what” the assistant is — name, capabilities, limitations
RULES.mdExplicit rules and restrictionsFor strict guardrails — what it MUST and MUST NOT do
File names are exact and case-sensitive. ChatCLI only looks for AGENTS.md, SOUL.md, USER.md, IDENTITY.md, and RULES.md. Other names (like CLAUDE.md, README.md, etc.) are not loaded by the bootstrap system.

Loading Priority

Files are searched at two levels, with the workspace taking priority:
1

Workspace (project root)

Project-specific configurations. Takes priority over global. The project root is detected automatically (see below).
2

Global (~/.chatcli/)

User default configurations. Serves as a fallback when the file does not exist in the workspace.
If the same file exists at both levels, the workspace version prevails. Global files serve as fallback.

Automatic Workspace Detection

ChatCLI uses detectProjectDir() to find the real project root. Instead of simply using the current directory (CWD), it walks up the directory tree looking for project markers:
  1. Checks if the current directory contains .git/ or .agent/
  2. If not found, moves to the parent directory and repeats
  3. Continues until a marker is found or it reaches the filesystem root
  4. If no marker is found, falls back to the CWD
This means you can run ChatCLI from any subdirectory of your project and bootstrap files at the root will be found normally.
Recognized markers: .git (Git repository) and .agent (explicit ChatCLI marker). Only one needs to exist to define the workspace root.

Example Scenarios

CWD at startupMarker foundDetected workspaceFiles loaded
~/project/~/project/.git~/project/~/project/SOUL.md, etc.
~/project/src/pkg/~/project/.git~/project/~/project/SOUL.md (walks up 2 levels)
~/project/src/pkg/none~/project/src/pkg/Global only (~/.chatcli/)
~/monorepo/services/api/~/monorepo/.git~/monorepo/~/monorepo/SOUL.md, etc.
~/tmp/none~/tmp/Global only (~/.chatcli/)

Detailed Examples

Defines personality and tone. Place at ~/.chatcli/SOUL.md (global) or ./SOUL.md (project):
# Personality

You are a technical assistant specialized in software engineering.
Be concise and direct. Prefer practical examples over theoretical explanations.
When suggesting code, use best practices and tests.

# Tone

- Professional but approachable
- Prefer short and objective responses
- Use bullet points for lists
- Default language: English

Where to Place Files

“Project root” means the directory containing .git/ or .agent/ — automatically detected by detectProjectDir(), not necessarily the CWD.
# GLOBAL configuration (applies to all projects)
~/.chatcli/SOUL.md
~/.chatcli/IDENTITY.md
~/.chatcli/RULES.md
~/.chatcli/USER.md
~/.chatcli/AGENTS.md

# Per-PROJECT configuration (overrides global)
# "Project root" = directory with .git/ or .agent/
<project-root>/SOUL.md          # In project root
<project-root>/USER.md          # In project root
<project-root>/RULES.md         # In project root (project rules)
<project-root>/IDENTITY.md      # In project root (rare)
<project-root>/AGENTS.md        # In project root (project agents)
CHATCLI_BOOTSTRAP_DIR only overrides the global directory (~/.chatcli/), not the workspace detection. Project-level files (detected via .git or .agent) still take priority over global ones, regardless of CHATCLI_BOOTSTRAP_DIR.
Recommended strategy: SOUL.md and IDENTITY.md global (they’re about the assistant), USER.md and RULES.md per-project (they’re about the work context).

Smart Cache

Bootstrap files use mtime-based (modification time) caching:
  • On the first read, the content is cached in memory
  • Subsequent reads check if the mtime has changed
  • If the file was modified, the cache is automatically invalidated
  • IsStale() checks if any file has changed since the last load

Persistent Memory

The memory system maintains context across ChatCLI sessions using structured storage with multiple components that learn about you and your work over time.

System Architecture

Conversation -> memoryWorker (3min) -> LLM extraction -> ProcessExtraction()
                                                              |
                    +───────────+───────────+─────────────────+────────────+
                    v           v           v                 v            v
              FactIndex    Profile    TopicTracker    ProjectTracker  DailyNote
              (scored)     (JSON)     (JSON)          (JSON)          (.md)
                    |
                    v
              Compactor (6h check, 24h cycle)
              |-- Level 1: Score-based pruning + archive
              +-- Level 2: LLM consolidation
                    |
                    v
              MEMORY.md (regenerated, never source of truth)

Resilient extraction — nothing is lost silently

Extraction depends on a background LLM call — and a provider outage must not cost the conversation its memory. Three layered defenses:
  1. Fallback chain: extraction tries the session’s active client and, on failure, walks CHATCLI_MEMORY_FALLBACK_PROVIDERS (or CHATCLI_FALLBACK_PROVIDERS), with a per-attempt timeout.
  2. Durable on-disk queue: a segment that fails on every provider is written to ~/.chatcli/memory/pending/ (atomic writes) and retried on later runs, oldest first — it survives restarts. The queue is capped (100 segments) and corrupt files are dropped without wedging the rest.
  3. A visible notice: two consecutive failures print a one-liner in the terminal (memory: extraction failing…) — days of silent fact loss can no longer happen.
The gateway consults this memory too: the daemon’s persona calls @memory recall before answering “I don’t know” to personal questions. See Chat Gateway.

Storage Structure

All memory lives in ~/.chatcli/memory/:
~/.chatcli/memory/
|-- MEMORY.md              # Human-readable summary (regenerated from FactIndex)
|-- memory_index.json      # Facts with relevance scores
|-- user_profile.json      # User profile (name, role, expertise)
|-- topics.json            # Recurring topics with frequency
|-- projects.json          # Active projects with context
|-- usage_stats.json       # Usage patterns and statistics
|-- memory_archive.json    # Archived facts (low score)
|-- 202603/                # Daily notes for March 2026
|   |-- 20260301.md
|   +-- 20260306.md
+-- 202602/
    +-- 20260228.md

Components

Replaces the old append-only MEMORY.md. Each fact has:
  • Unique ID via SHA-256 content hash (automatic deduplication)
  • Category: architecture, pattern, preference, gotcha, project, personal
  • Temporal score: (1 + log(accessCount)) * exp(-days * ln2 / halfLife)
  • Tags for keyword search
Frequently accessed and recent facts get higher scores. Old, never-accessed facts naturally decay.
Automatically detected by the AI during extraction:
  • Name, role, expertise level
  • Preferred language and communication style
  • Most used commands (top 10)
  • General preferences
View your profile with /memory profile.
Tracks technical topics discussed:
  • Mention frequency
  • Recency (recent topics weigh more)
  • Links to related facts
View with /memory topics.
Tracks projects you work on:
  • Name, path, description
  • Technologies used
  • Status (active, paused, completed)
  • Last activity
View with /memory projects.
Analyzes how you use ChatCLI:
  • Total sessions and average duration
  • Peak activity hours
  • Preferred features (chat, agent, coder)
  • Common errors and resolutions
View with /memory stats.

Smart Retrieval

Instead of dumping all memory into the system prompt, ChatCLI uses intelligent retrieval:
  1. Extracts keywords from the last few conversation messages
  2. Searches relevant facts in FactIndex by keyword match + temporal score
  3. Respects a configurable budget (default: 4000 characters)
  4. Prioritizes: Profile > Projects > Topics > Relevant facts > Recent notes
Facts accessed by the retriever automatically get their scores bumped, creating a virtuous cycle: the more useful a fact is, the more it appears.

Injection mode: push vs pull

How memory reaches the model in /agent and /coder is controlled by CHATCLI_MEMORY_MODE:
ModeBehavior
index (default)Injects only a compact, stable index (profile summary + top topic/project names + fact tally by category) and lets the agent pull detail on demand via @memory recall. Bounded, cacheable per-turn cost.
fullInjects the full Smart Retrieval (above) every turn — the classic “push” behavior.
offInjects no memory; bootstrap files still apply.
The pull mode (index) shrinks the per-turn memory block by ~88% on a 500-fact store without losing access to detail — see Token Efficiency › Pull-first memory for the full measurement. Chat is tool-less and cannot pull on demand: there index degrades to full. The @memory recall tool uses HyDE + vector search, so pulled detail matches push quality. Check the active mode with /config memory.

Memory Configuration

The memory system has tunable parameters via environment variables:
VariableDefaultDescription
CHATCLI_MEMORY_MODEindexInjection mode in agent/coder: index (pull), full (push) or off
CHATCLI_MEMORY_MAX_SIZE32768 (32KB)Max size of rendered MEMORY.md
CHATCLI_MEMORY_RETENTION_DAYS30Daily note retention before cleanup
CHATCLI_MEMORY_MAX_FACTS500Max facts in the FactIndex
CHATCLI_MEMORY_RETRIEVAL_BUDGET4000Max chars of memory injected in system prompt (full mode)
In addition to environment variables, the internal Config struct defines additional defaults:
ParameterDefault ValueDescription
CompactionInterval24 hoursMinimum interval between full compactions
DecayHalfLifeDays30.0Temporal decay half-life for fact scores
Check interval6 hoursHow often the system checks if compaction is needed

How Memories Are Created

The background worker now extracts 5 types of information (previously only 2):
  1. DAILY — What was done (files, commands, errors, tasks)
  2. LONGTERM — New facts to remember permanently
  3. PROFILE_UPDATE — Information about the user (name, role, expertise)
  4. TOPICS — Technical topics discussed
  5. PROJECTS — Projects worked on
The worker fires after 4+ new messages with a 2-minute cooldown, and also every 3 minutes during long sessions.

Extraction Process

The Memory Worker follows this internal flow:
  1. EnhancedExtractionPrompt: Sends the recent conversation history to the LLM with a structured prompt requesting information extraction
  2. Expected output: The LLM returns text with well-defined section headers:
    • ## DAILY — Summary of what was done in the session
    • ## LONGTERM — New facts for long-term memory
    • ## PROFILE_UPDATE — User profile updates
    • ## TOPICS — Technical topics identified
    • ## PROJECTS — Projects mentioned or worked on
  3. ParseEnhancedResponse(): Parses the response and extracts each section individually
  4. Deduplication: Each fact receives a unique ID via SHA-256 content hash. Facts with a hash matching an existing one are automatically discarded
  5. Profile merging: PROFILE_UPDATE changes are merged with the existing profile, never fully replaced

Automatic Compaction

The system runs periodic compaction to prevent uncontrolled growth:
1

Check (every 6 hours)

Checks if the fact count exceeds 80% of the limit or if 24h have passed since the last compaction.
2

LLM Compaction (preferred)

Sends all facts to the AI with instructions to: merge duplicates, remove obsolete, consolidate related. Preserves original metadata.
3

Score-based Fallback

If the LLM call fails, archives facts with scores below 0.1 to memory_archive.json.
4

Daily Note Cleanup

Removes notes older than the retention period (default: 30 days). Empty directories are cleaned up.
5

MEMORY.md Regeneration

Rewrites MEMORY.md from the FactIndex — always up-to-date, never the source of truth.

Automatic Migration

When starting for the first time with the new system, ChatCLI detects if a legacy MEMORY.md exists (without memory_index.json) and migrates automatically:
  1. Each line/bullet is converted to an individual fact
  2. Categories are detected from markdown headers
  3. Tags are extracted by technical keywords
  4. The original file is saved as MEMORY.md.bak

/memory Command

SubcommandDescription
/memory or /memory todayShow today’s notes
/memory yesterdayShow yesterday’s notes
/memory 2026-03-04Show notes from a specific date
/memory weekShow notes from the last 7 days
/memory longtermShow MEMORY.md content
/memory listList all memory files (includes structured JSONs)
/memory load <date>Load a day’s notes into conversation context
/memory profileShow detected user profile
/memory profile set <field>=<value>Set/update a profile field manually
/memory remember <fact>Explicitly add a long-term fact (accepts a [category] prefix)
/memory forget <substring>Remove long-term facts containing the substring
/memory topicsShow tracked topics with frequency
/memory projectsShow tracked projects with status
/memory statsFull statistics (sessions, peak hours, errors, features)
/memory facts [category]List facts with scores (filter by category)
/memory compactForce immediate compaction (LLM + note cleanup)

Manual editing and extended profile

Automatic detection doesn’t always catch everything, so you can edit memory explicitly. The user profile now covers, beyond name/role/expertise: company, location, skills, certifications and goals. List fields (skills, certifications, goals) accumulate and dedupe.
> /memory profile set company=ACME Corp
> /memory profile set location=São Paulo, BR
> /memory profile set certifications=CKA, AWS SAA   # becomes a deduplicated list
> /memory remember [preference] Prefers Go over Python for CLIs
> /memory forget Python                              # removes facts containing "Python"

@memory tool (in agent mode)

Inside /agent and /coder, the model can persist memory on its own via the @memory tool (cmds remember, profile, forget, recall):
<tool_call name="@memory" args='{"cmd":"remember","args":{"content":"User earned the AWS Solutions Architect certification","category":"personal"}}' />
So when you tell the agent something new (e.g. a fresh certification), it records it into your profile/long-term facts without you running /memory manually.
  • FactIndex: Stable, long-lasting facts — decisions, patterns, gotchas, preferences
  • UserProfile: Who you are — name, role, expertise, language
  • TopicTracker: What you talk about — Go, Docker, K8s, etc.
  • ProjectTracker: What you work on — chatcli, my-app, etc.
  • PatternDetector: How you work — schedules, features, common errors
  • Daily notes: What happened today — temporal and specific
Yes! All files are plain JSON or Markdown:
# View profile
cat ~/.chatcli/memory/user_profile.json | jq .

# View facts with scores
cat ~/.chatcli/memory/memory_index.json | jq '.[0:5]'

# Edit today's note
vim ~/.chatcli/memory/$(date +%Y%m)/$(date +%Y%m%d).md
JSON changes are loaded on next startup. MEMORY.md is regenerated and should not be edited directly.
Each fact has a score calculated by:
score = (1 + log(1 + accessCount)) * exp(-daysSinceAccess * ln(2) / halfLifeDays)
  • accessCount: How many times the fact was used by the retriever
  • daysSinceAccess: Days since last access
  • halfLifeDays: Decay half-life (default: 30 days)
Frequently and recently accessed facts get high scores. Never-accessed facts decay to ~0 after 3-4 half-lives.

What Gets Injected into the Prompt

The ContextBuilder assembles the following block and injects it as a system prompt prefix:
## AGENTS.md

[content of AGENTS.md]

---

## SOUL.md

[content of SOUL.md]

---

## USER.md

[content of USER.md]

---

## IDENTITY.md

[content of IDENTITY.md]

---

## RULES.md

[content of RULES.md]

---

# Memory

## Long-term Memory

[content of MEMORY.md]

## Recent Daily Notes

### 2026-03-04

[content of March 4th note]

### 2026-03-05

[content of March 5th note]

### 2026-03-06

[content of today's note]
Empty sections (missing files) are automatically omitted — only what exists is injected.

Configuration

CHATCLI_BOOTSTRAP_ENABLED=true
CHATCLI_BOOTSTRAP_DIR=/path/to/bootstrap/files
CHATCLI_MEMORY_ENABLED=true
VariableDefaultDescription
CHATCLI_BOOTSTRAP_ENABLEDtrueEnable/disable bootstrap file loading
CHATCLI_BOOTSTRAP_DIR~/.chatcli/Alternative directory for global bootstrap files. Use this when you want to keep your files (SOUL.md, RULES.md, etc.) in another location, such as a versioned repository or a directory shared across machines
CHATCLI_MEMORY_ENABLEDtrueEnable/disable the persistent memory system
CHATCLI_BOOTSTRAP_DIR only overrides the global directory (~/.chatcli/). Project-level files (detected via .git or .agent markers) still take priority over global ones.

Context Injection Optimization (Prompt Caching)

ChatCLI optimizes token costs when contexts are attached using three complementary strategies:

Unified System Prompt with Cache Hints

Contexts attached via /context attach are injected as system prompt, not as user messages. This enables provider-level prompt caching:
ProviderMechanismDiscount
Anthropiccache_control: ephemeral~90%
OpenAIAutomatic prompt caching~50%
GoogleContext caching APIVariable
The system prompt block contains:
  1. Bootstrap (SOUL.md, USER.md, etc.)
  2. Memory (MEMORY.md + daily notes)
  3. Attached Contexts (new — previously injected as user messages)
  4. K8s Watcher (if active)
Since the system prompt is identical across turns, the provider caches it and charges tokens at a discount.

Smart Compaction

Injected context messages (/memory load, summarized contexts) are automatically truncated during compaction (Level 1 — trimming). This prevents old reference context from consuming valuable token budget.

Token Visibility

The /context attached command now shows:
  • Estimated tokens per context
  • Total tokens per turn
  • Cache hints per provider
  • Warnings for oversized contexts
When running /context attach, the feedback includes the estimated cost per turn.

Best Practices

Global SOUL.md, per-project USER.md

Keep your preferred personality globally and technical context per project.

Keep MEMORY.md concise

Keep only stable and confirmed facts — not session-specific ones.

Daily notes for journaling

Use them to record decisions, solved problems, and temporal context.

Don't duplicate CLAUDE.md

If you already use CLAUDE.md or project instructions, avoid duplicating them in bootstrap.
Periodically review your memories and remove outdated ones to keep the context relevant.

Next Steps

Conversation Control

Use /compact and /rewind to manage conversation size and state.

Sessions

Save and reuse conversations across projects.