The bootstrap and memory system is fully connected to the system prompt flow. Files are automatically loaded and injected into all interactions — in chat mode as well as in
/agent and /coder modes.Bootstrap Files
Bootstrap files are Markdown documents automatically loaded into the agent’s system prompt. They define who the assistant is, how it behaves, and which rules it should follow.Supported Files
ChatCLI loads exactly 5 bootstrap files, in this order. All are optional — if they don’t exist, they are simply skipped:| File | Purpose | When to use |
|---|---|---|
AGENTS.md | Sub-agent definitions and their roles | When you want to instruct the orchestrator about available agents and how to use them |
SOUL.md | Assistant personality, tone and style | To define “who” the assistant is — how it speaks, thinks and behaves |
USER.md | User preferences and project context | To inform the stack, conventions, preferred tools and project context |
IDENTITY.md | Agent identity and capabilities | To define “what” the assistant is — name, capabilities, limitations |
RULES.md | Explicit rules and restrictions | For strict guardrails — what it MUST and MUST NOT do |
Loading Priority
Files are searched at two levels, with the workspace taking priority:Workspace (project root)
Project-specific configurations. Takes priority over global. The project root is detected automatically (see below).
Automatic Workspace Detection
ChatCLI usesdetectProjectDir() to find the real project root. Instead of simply using the current directory (CWD), it walks up the directory tree looking for project markers:
- Checks if the current directory contains
.git/or.agent/ - If not found, moves to the parent directory and repeats
- Continues until a marker is found or it reaches the filesystem root
- If no marker is found, falls back to the CWD
Recognized markers:
.git (Git repository) and .agent (explicit ChatCLI marker). Only one needs to exist to define the workspace root.Example Scenarios
| CWD at startup | Marker found | Detected workspace | Files loaded |
|---|---|---|---|
~/project/ | ~/project/.git | ~/project/ | ~/project/SOUL.md, etc. |
~/project/src/pkg/ | ~/project/.git | ~/project/ | ~/project/SOUL.md (walks up 2 levels) |
~/project/src/pkg/ | none | ~/project/src/pkg/ | Global only (~/.chatcli/) |
~/monorepo/services/api/ | ~/monorepo/.git | ~/monorepo/ | ~/monorepo/SOUL.md, etc. |
~/tmp/ | none | ~/tmp/ | Global only (~/.chatcli/) |
Detailed Examples
- SOUL.md
- USER.md
- IDENTITY.md
- RULES.md
- AGENTS.md
Defines personality and tone. Place at
~/.chatcli/SOUL.md (global) or ./SOUL.md (project):Where to Place Files
“Project root” means the directory containing.git/ or .agent/ — automatically detected by detectProjectDir(), not necessarily the CWD.
Smart Cache
Bootstrap files use mtime-based (modification time) caching:- On the first read, the content is cached in memory
- Subsequent reads check if the mtime has changed
- If the file was modified, the cache is automatically invalidated
IsStale()checks if any file has changed since the last load
Persistent Memory
The memory system maintains context across ChatCLI sessions using structured storage with multiple components that learn about you and your work over time.System Architecture
Resilient extraction — nothing is lost silently
Extraction depends on a background LLM call — and a provider outage must not cost the conversation its memory. Three layered defenses:- Fallback chain: extraction tries the session’s active client and, on failure, walks
CHATCLI_MEMORY_FALLBACK_PROVIDERS(orCHATCLI_FALLBACK_PROVIDERS), with a per-attempt timeout. - Durable on-disk queue: a segment that fails on every provider is written to
~/.chatcli/memory/pending/(atomic writes) and retried on later runs, oldest first — it survives restarts. The queue is capped (100 segments) and corrupt files are dropped without wedging the rest. - A visible notice: two consecutive failures print a one-liner in the terminal (
memory: extraction failing…) — days of silent fact loss can no longer happen.
The gateway consults this memory too: the daemon’s persona calls
@memory recall before answering “I don’t know” to personal questions. See Chat Gateway.Storage Structure
All memory lives in~/.chatcli/memory/:
Components
FactIndex -- Long-Term Memory
FactIndex -- Long-Term Memory
Replaces the old append-only MEMORY.md. Each fact has:
- Unique ID via SHA-256 content hash (automatic deduplication)
- Category: architecture, pattern, preference, gotcha, project, personal
- Temporal score:
(1 + log(accessCount)) * exp(-days * ln2 / halfLife) - Tags for keyword search
UserProfile -- User Profile
UserProfile -- User Profile
Automatically detected by the AI during extraction:
- Name, role, expertise level
- Preferred language and communication style
- Most used commands (top 10)
- General preferences
/memory profile.TopicTracker -- Recurring Topics
TopicTracker -- Recurring Topics
Tracks technical topics discussed:
- Mention frequency
- Recency (recent topics weigh more)
- Links to related facts
/memory topics.ProjectTracker -- Projects
ProjectTracker -- Projects
Tracks projects you work on:
- Name, path, description
- Technologies used
- Status (active, paused, completed)
- Last activity
/memory projects.PatternDetector -- Usage Patterns
PatternDetector -- Usage Patterns
Analyzes how you use ChatCLI:
- Total sessions and average duration
- Peak activity hours
- Preferred features (chat, agent, coder)
- Common errors and resolutions
/memory stats.Smart Retrieval
Instead of dumping all memory into the system prompt, ChatCLI uses intelligent retrieval:- Extracts keywords from the last few conversation messages
- Searches relevant facts in FactIndex by keyword match + temporal score
- Respects a configurable budget (default: 4000 characters)
- Prioritizes: Profile > Projects > Topics > Relevant facts > Recent notes
Facts accessed by the retriever automatically get their scores bumped, creating a virtuous cycle: the more useful a fact is, the more it appears.
Injection mode: push vs pull
How memory reaches the model in/agent and /coder is controlled by CHATCLI_MEMORY_MODE:
| Mode | Behavior |
|---|---|
index (default) | Injects only a compact, stable index (profile summary + top topic/project names + fact tally by category) and lets the agent pull detail on demand via @memory recall. Bounded, cacheable per-turn cost. |
full | Injects the full Smart Retrieval (above) every turn — the classic “push” behavior. |
off | Injects no memory; bootstrap files still apply. |
index) shrinks the per-turn memory block by ~88% on a 500-fact store without losing access to detail — see Token Efficiency › Pull-first memory for the full measurement. Chat is tool-less and cannot pull on demand: there index degrades to full. The @memory recall tool uses HyDE + vector search, so pulled detail matches push quality. Check the active mode with /config memory.
Memory Configuration
The memory system has tunable parameters via environment variables:| Variable | Default | Description |
|---|---|---|
CHATCLI_MEMORY_MODE | index | Injection mode in agent/coder: index (pull), full (push) or off |
CHATCLI_MEMORY_MAX_SIZE | 32768 (32KB) | Max size of rendered MEMORY.md |
CHATCLI_MEMORY_RETENTION_DAYS | 30 | Daily note retention before cleanup |
CHATCLI_MEMORY_MAX_FACTS | 500 | Max facts in the FactIndex |
CHATCLI_MEMORY_RETRIEVAL_BUDGET | 4000 | Max chars of memory injected in system prompt (full mode) |
Config struct defines additional defaults:
| Parameter | Default Value | Description |
|---|---|---|
CompactionInterval | 24 hours | Minimum interval between full compactions |
DecayHalfLifeDays | 30.0 | Temporal decay half-life for fact scores |
| Check interval | 6 hours | How often the system checks if compaction is needed |
How Memories Are Created
The background worker now extracts 5 types of information (previously only 2):- DAILY — What was done (files, commands, errors, tasks)
- LONGTERM — New facts to remember permanently
- PROFILE_UPDATE — Information about the user (name, role, expertise)
- TOPICS — Technical topics discussed
- PROJECTS — Projects worked on
Extraction Process
The Memory Worker follows this internal flow:- EnhancedExtractionPrompt: Sends the recent conversation history to the LLM with a structured prompt requesting information extraction
- Expected output: The LLM returns text with well-defined section headers:
## DAILY— Summary of what was done in the session## LONGTERM— New facts for long-term memory## PROFILE_UPDATE— User profile updates## TOPICS— Technical topics identified## PROJECTS— Projects mentioned or worked on
- ParseEnhancedResponse(): Parses the response and extracts each section individually
- Deduplication: Each fact receives a unique ID via SHA-256 content hash. Facts with a hash matching an existing one are automatically discarded
- Profile merging:
PROFILE_UPDATEchanges are merged with the existing profile, never fully replaced
Automatic Compaction
The system runs periodic compaction to prevent uncontrolled growth:Check (every 6 hours)
Checks if the fact count exceeds 80% of the limit or if 24h have passed since the last compaction.
LLM Compaction (preferred)
Sends all facts to the AI with instructions to: merge duplicates, remove obsolete, consolidate related. Preserves original metadata.
Score-based Fallback
If the LLM call fails, archives facts with scores below 0.1 to
memory_archive.json.Daily Note Cleanup
Removes notes older than the retention period (default: 30 days). Empty directories are cleaned up.
Automatic Migration
When starting for the first time with the new system, ChatCLI detects if a legacyMEMORY.md exists (without memory_index.json) and migrates automatically:
- Each line/bullet is converted to an individual fact
- Categories are detected from markdown headers
- Tags are extracted by technical keywords
- The original file is saved as
MEMORY.md.bak
/memory Command
| Subcommand | Description |
|---|---|
/memory or /memory today | Show today’s notes |
/memory yesterday | Show yesterday’s notes |
/memory 2026-03-04 | Show notes from a specific date |
/memory week | Show notes from the last 7 days |
/memory longterm | Show MEMORY.md content |
/memory list | List all memory files (includes structured JSONs) |
/memory load <date> | Load a day’s notes into conversation context |
/memory profile | Show detected user profile |
/memory profile set <field>=<value> | Set/update a profile field manually |
/memory remember <fact> | Explicitly add a long-term fact (accepts a [category] prefix) |
/memory forget <substring> | Remove long-term facts containing the substring |
/memory topics | Show tracked topics with frequency |
/memory projects | Show tracked projects with status |
/memory stats | Full statistics (sessions, peak hours, errors, features) |
/memory facts [category] | List facts with scores (filter by category) |
/memory compact | Force immediate compaction (LLM + note cleanup) |
Manual editing and extended profile
Automatic detection doesn’t always catch everything, so you can edit memory explicitly. The user profile now covers, beyond name/role/expertise: company, location, skills, certifications and goals. List fields (skills, certifications, goals) accumulate and dedupe.@memory tool (in agent mode)
Inside /agent and /coder, the model can persist memory on its own via the @memory tool (cmds remember, profile, forget, recall):
/memory manually.
What goes in each component?
What goes in each component?
- FactIndex: Stable, long-lasting facts — decisions, patterns, gotchas, preferences
- UserProfile: Who you are — name, role, expertise, language
- TopicTracker: What you talk about — Go, Docker, K8s, etc.
- ProjectTracker: What you work on — chatcli, my-app, etc.
- PatternDetector: How you work — schedules, features, common errors
- Daily notes: What happened today — temporal and specific
Can I edit memories manually?
Can I edit memories manually?
Yes! All files are plain JSON or Markdown:JSON changes are loaded on next startup. MEMORY.md is regenerated and should not be edited directly.
How does fact scoring work?
How does fact scoring work?
Each fact has a score calculated by:
- accessCount: How many times the fact was used by the retriever
- daysSinceAccess: Days since last access
- halfLifeDays: Decay half-life (default: 30 days)
What Gets Injected into the Prompt
The ContextBuilder assembles the following block and injects it as a system prompt prefix:Configuration
- Environment Variables
- Via Helm Chart
| Variable | Default | Description |
|---|---|---|
CHATCLI_BOOTSTRAP_ENABLED | true | Enable/disable bootstrap file loading |
CHATCLI_BOOTSTRAP_DIR | ~/.chatcli/ | Alternative directory for global bootstrap files. Use this when you want to keep your files (SOUL.md, RULES.md, etc.) in another location, such as a versioned repository or a directory shared across machines |
CHATCLI_MEMORY_ENABLED | true | Enable/disable the persistent memory system |
CHATCLI_BOOTSTRAP_DIR only overrides the global directory (~/.chatcli/). Project-level files (detected via .git or .agent markers) still take priority over global ones.Context Injection Optimization (Prompt Caching)
ChatCLI optimizes token costs when contexts are attached using three complementary strategies:Unified System Prompt with Cache Hints
Contexts attached via/context attach are injected as system prompt, not as user messages. This enables provider-level prompt caching:
| Provider | Mechanism | Discount |
|---|---|---|
| Anthropic | cache_control: ephemeral | ~90% |
| OpenAI | Automatic prompt caching | ~50% |
| Context caching API | Variable |
- Bootstrap (SOUL.md, USER.md, etc.)
- Memory (MEMORY.md + daily notes)
- Attached Contexts (new — previously injected as user messages)
- K8s Watcher (if active)
Smart Compaction
Injected context messages (/memory load, summarized contexts) are automatically truncated during compaction (Level 1 — trimming). This prevents old reference context from consuming valuable token budget.
Token Visibility
The/context attached command now shows:
- Estimated tokens per context
- Total tokens per turn
- Cache hints per provider
- Warnings for oversized contexts
/context attach, the feedback includes the estimated cost per turn.
Best Practices
Global SOUL.md, per-project USER.md
Keep your preferred personality globally and technical context per project.
Keep MEMORY.md concise
Keep only stable and confirmed facts — not session-specific ones.
Daily notes for journaling
Use them to record decisions, solved problems, and temporal context.
Don't duplicate CLAUDE.md
If you already use CLAUDE.md or project instructions, avoid duplicating them in bootstrap.
Next Steps
Conversation Control
Use /compact and /rewind to manage conversation size and state.
Sessions
Save and reuse conversations across projects.