Bootstrap and Persistent Memory

ChatCLI offers two complementary systems for customizing and contextualizing the agent: bootstrap files to define personality and rules, and persistent memory to maintain context across sessions.

The bootstrap and memory system is fully connected to the system prompt flow. Files are automatically loaded and injected into all interactions — in chat mode as well as in /agent and /coder modes.

Bootstrap Files

Bootstrap files are Markdown documents automatically loaded into the agent’s system prompt. They define who the assistant is, how it behaves, and which rules it should follow.

Supported Files

ChatCLI loads exactly 5 bootstrap files, in this order. All are optional — if they don’t exist, they are simply skipped:

File	Purpose	When to use
`AGENTS.md`	Sub-agent definitions and their roles	When you want to instruct the orchestrator about available agents and how to use them
`SOUL.md`	Assistant personality, tone and style	To define “who” the assistant is — how it speaks, thinks and behaves
`USER.md`	User preferences and project context	To inform the stack, conventions, preferred tools and project context
`IDENTITY.md`	Agent identity and capabilities	To define “what” the assistant is — name, capabilities, limitations
`RULES.md`	Explicit rules and restrictions	For strict guardrails — what it MUST and MUST NOT do

File names are exact and case-sensitive. ChatCLI only looks for AGENTS.md, SOUL.md, USER.md, IDENTITY.md, and RULES.md. Other names (like CLAUDE.md, README.md, etc.) are not loaded by the bootstrap system.

Loading Priority

Files are searched at two levels, with the workspace taking priority:

Workspace (project root)

Project-specific configurations. Takes priority over global. The project root is detected automatically (see below).

Global (~/.chatcli/)

User default configurations. Serves as a fallback when the file does not exist in the workspace.

If the same file exists at both levels, the workspace version prevails. Global files serve as fallback.

Automatic Workspace Detection

ChatCLI uses detectProjectDir() to find the real project root. Instead of simply using the current directory (CWD), it walks up the directory tree looking for project markers:

Checks if the current directory contains .git/ or .agent/
If not found, moves to the parent directory and repeats
Continues until a marker is found or it reaches the filesystem root
If no marker is found, falls back to the CWD

This means you can run ChatCLI from any subdirectory of your project and bootstrap files at the root will be found normally.

Recognized markers: .git (Git repository) and .agent (explicit ChatCLI marker). Only one needs to exist to define the workspace root.

Example Scenarios

CWD at startup	Marker found	Detected workspace	Files loaded
`~/project/`	`~/project/.git`	`~/project/`	`~/project/SOUL.md`, etc.
`~/project/src/pkg/`	`~/project/.git`	`~/project/`	`~/project/SOUL.md` (walks up 2 levels)
`~/project/src/pkg/`	none	`~/project/src/pkg/`	Global only (`~/.chatcli/`)
`~/monorepo/services/api/`	`~/monorepo/.git`	`~/monorepo/`	`~/monorepo/SOUL.md`, etc.
`~/tmp/`	none	`~/tmp/`	Global only (`~/.chatcli/`)

Detailed Examples

SOUL.md
USER.md
IDENTITY.md
RULES.md
AGENTS.md

Defines personality and tone. Place at ~/.chatcli/SOUL.md (global) or ./SOUL.md (project):

# Personality

You are a technical assistant specialized in software engineering.
Be concise and direct. Prefer practical examples over theoretical explanations.
When suggesting code, use best practices and tests.

# Tone

- Professional but approachable
- Prefer short and objective responses
- Use bullet points for lists
- Default language: English

Defines user and project context. Ideal for ./USER.md in the project directory:

# Project Context

- Stack: Go 1.25, gRPC, Kubernetes
- Database: PostgreSQL 16
- CI/CD: GitHub Actions
- Style: conventional commits, trunk-based development

# Preferences

- Always use tables for comparisons
- Prefer simple solutions without over-engineering
- Tests with idiomatic Go table-driven tests

Defines what the assistant is. Usually global at ~/.chatcli/IDENTITY.md:

# Identity

You are ChatCLI, an intelligent terminal assistant.

## Capabilities

- Code reading and editing via @coder plugin
- Shell command execution with user approval
- Log analysis and error diagnosis
- Git operations (status, diff, log, commit)

## Limitations

- You do NOT have internet access
- You CANNOT install packages without approval
- Your patches may fail if context has changed

Defines strict rules and guardrails. Can be global or per-project:

# Mandatory Rules

1. NEVER execute `rm -rf` without explicit confirmation
2. NEVER commit directly to the main branch
3. ALWAYS run tests after modifying code
4. ALWAYS use conventional commits (feat:, fix:, chore:)

# Security Restrictions

- Do not expose secrets, tokens or API keys in logs
- Do not modify files outside the project directory
- Do not execute commands with sudo

Defines sub-agents and their roles for the orchestrator:

# Custom Agents

## @devops
Infrastructure specialist for Docker, Kubernetes and CI/CD.
Use for deployment, monitoring and pipeline configuration tasks.

## @dba
PostgreSQL database specialist.
Use for queries, optimization, migrations and performance analysis.

## @security
Security auditor focused on OWASP Top 10.
Use for code review with focus on vulnerabilities.

Where to Place Files

“Project root” means the directory containing .git/ or .agent/ — automatically detected by detectProjectDir(), not necessarily the CWD.

# GLOBAL configuration (applies to all projects)
~/.chatcli/SOUL.md
~/.chatcli/IDENTITY.md
~/.chatcli/RULES.md
~/.chatcli/USER.md
~/.chatcli/AGENTS.md

# Per-PROJECT configuration (overrides global)
# "Project root" = directory with .git/ or .agent/
<project-root>/SOUL.md          # In project root
<project-root>/USER.md          # In project root
<project-root>/RULES.md         # In project root (project rules)
<project-root>/IDENTITY.md      # In project root (rare)
<project-root>/AGENTS.md        # In project root (project agents)

CHATCLI_BOOTSTRAP_DIR only overrides the global directory (~/.chatcli/), not the workspace detection. Project-level files (detected via .git or .agent) still take priority over global ones, regardless of CHATCLI_BOOTSTRAP_DIR.

Recommended strategy: SOUL.md and IDENTITY.md global (they’re about the assistant), USER.md and RULES.md per-project (they’re about the work context).

Smart Cache

Bootstrap files use mtime-based (modification time) caching:

On the first read, the content is cached in memory
Subsequent reads check if the mtime has changed
If the file was modified, the cache is automatically invalidated
IsStale() checks if any file has changed since the last load

Persistent Memory

The memory system maintains context across ChatCLI sessions using structured storage with multiple components that learn about you and your work over time.

System Architecture

Conversation -> memoryWorker (3min) -> LLM extraction -> ProcessExtraction()
                                                              |
                    +───────────+───────────+─────────────────+────────────+
                    v           v           v                 v            v
              FactIndex    Profile    TopicTracker    ProjectTracker  DailyNote
              (scored)     (JSON)     (JSON)          (JSON)          (.md)
                    |
                    v
              Compactor (6h check, 24h cycle)
              |-- Level 1: Score-based pruning + archive
              +-- Level 2: LLM consolidation
                    |
                    v
              MEMORY.md (regenerated, never source of truth)

Resilient extraction — nothing is lost silently

Extraction depends on a background LLM call — and a provider outage must not cost the conversation its memory. Three layered defenses:

Fallback chain: extraction tries the session’s active client and, on failure, walks CHATCLI_MEMORY_FALLBACK_PROVIDERS (or CHATCLI_FALLBACK_PROVIDERS), with a per-attempt timeout.
Durable on-disk queue: a segment that fails on every provider is written to ~/.chatcli/memory/pending/ (atomic writes) and retried on later runs, oldest first — it survives restarts. The queue is capped (100 segments) and corrupt files are dropped without wedging the rest.
A visible notice: two consecutive failures print a one-liner in the terminal (memory: extraction failing…) — days of silent fact loss can no longer happen.

The gateway consults this memory too: the daemon’s persona calls @memory recall before answering “I don’t know” to personal questions. See Chat Gateway.

Storage Structure

All memory lives in ~/.chatcli/memory/:

~/.chatcli/memory/
|-- MEMORY.md              # Human-readable summary (regenerated from FactIndex)
|-- memory_index.json      # Facts with relevance scores
|-- user_profile.json      # User profile (name, role, expertise)
|-- topics.json            # Recurring topics with frequency
|-- projects.json          # Active projects with context
|-- usage_stats.json       # Usage patterns and statistics
|-- memory_archive.json    # Archived facts (low score)
|-- 202603/                # Daily notes for March 2026
|   |-- 20260301.md
|   +-- 20260306.md
+-- 202602/
    +-- 20260228.md

Components

FactIndex -- Long-Term Memory

Replaces the old append-only MEMORY.md. Each fact has:

Unique ID via SHA-256 content hash (automatic deduplication)
Category: architecture, pattern, preference, gotcha, project, personal
Temporal score: (1 + log(accessCount)) * exp(-days * ln2 / halfLife)
Tags for keyword search

Frequently accessed and recent facts get higher scores. Old, never-accessed facts naturally decay.

UserProfile -- User Profile

Automatically detected by the AI during extraction:

Name, role, expertise level
Preferred language and communication style
Most used commands (top 10)
General preferences

View your profile with /memory profile.

TopicTracker -- Recurring Topics

Tracks technical topics discussed:

Mention frequency
Recency (recent topics weigh more)
Links to related facts

View with /memory topics.

ProjectTracker -- Projects

Tracks projects you work on:

Name, path, description
Technologies used
Status (active, paused, completed)
Last activity

View with /memory projects.

PatternDetector -- Usage Patterns

Analyzes how you use ChatCLI:

Total sessions and average duration
Peak activity hours
Preferred features (chat, agent, coder)
Common errors and resolutions

View with /memory stats.

Smart Retrieval

Instead of dumping all memory into the system prompt, ChatCLI uses intelligent retrieval:

Extracts keywords from the last few conversation messages
Searches relevant facts in FactIndex by keyword match + temporal score
Respects a configurable budget (default: 4000 characters)
Prioritizes: Profile > Projects > Topics > Relevant facts > Recent notes

Facts accessed by the retriever automatically get their scores bumped, creating a virtuous cycle: the more useful a fact is, the more it appears.

Injection mode: push vs pull

How memory reaches the model in /agent and /coder is controlled by CHATCLI_MEMORY_MODE:

Mode	Behavior
`index` (default)	Injects only a compact, stable index (profile summary + top topic/project names + fact tally by category) and lets the agent pull detail on demand via `@memory recall`. Bounded, cacheable per-turn cost.
`full`	Injects the full Smart Retrieval (above) every turn — the classic “push” behavior.
`off`	Injects no memory; bootstrap files still apply.

The pull mode (index) shrinks the per-turn memory block by ~88% on a 500-fact store without losing access to detail — see Token Efficiency › Pull-first memory for the full measurement. Chat is tool-less and cannot pull on demand: there index degrades to full. The @memory recall tool uses HyDE + vector search, so pulled detail matches push quality. Check the active mode with /config memory.

Memory Configuration

The memory system has tunable parameters via environment variables:

Variable	Default	Description
`CHATCLI_MEMORY_MODE`	`index`	Injection mode in agent/coder: `index` (pull), `full` (push) or `off`
`CHATCLI_MEMORY_MAX_SIZE`	`32768` (32KB)	Max size of rendered MEMORY.md
`CHATCLI_MEMORY_RETENTION_DAYS`	`30`	Daily note retention before cleanup
`CHATCLI_MEMORY_MAX_FACTS`	`500`	Max facts in the FactIndex
`CHATCLI_MEMORY_RETRIEVAL_BUDGET`	`4000`	Max chars of memory injected in system prompt (`full` mode)

In addition to environment variables, the internal Config struct defines additional defaults:

Parameter	Default Value	Description
`CompactionInterval`	24 hours	Minimum interval between full compactions
`DecayHalfLifeDays`	30.0	Temporal decay half-life for fact scores
Check interval	6 hours	How often the system checks if compaction is needed

How Memories Are Created

The background worker now extracts 5 types of information (previously only 2):

DAILY — What was done (files, commands, errors, tasks)
LONGTERM — New facts to remember permanently
PROFILE_UPDATE — Information about the user (name, role, expertise)
TOPICS — Technical topics discussed
PROJECTS — Projects worked on

The worker fires after 4+ new messages with a 2-minute cooldown, and also every 3 minutes during long sessions.

Extraction Process

The Memory Worker follows this internal flow:

EnhancedExtractionPrompt: Sends the recent conversation history to the LLM with a structured prompt requesting information extraction
Expected output: The LLM returns text with well-defined section headers:
- ## DAILY — Summary of what was done in the session
- ## LONGTERM — New facts for long-term memory
- ## PROFILE_UPDATE — User profile updates
- ## TOPICS — Technical topics identified
- ## PROJECTS — Projects mentioned or worked on
ParseEnhancedResponse(): Parses the response and extracts each section individually
Deduplication: Each fact receives a unique ID via SHA-256 content hash. Facts with a hash matching an existing one are automatically discarded
Profile merging: PROFILE_UPDATE changes are merged with the existing profile, never fully replaced

Automatic Compaction

The system runs periodic compaction to prevent uncontrolled growth:

Check (every 6 hours)

Checks if the fact count exceeds 80% of the limit or if 24h have passed since the last compaction.

LLM Compaction (preferred)

Sends all facts to the AI with instructions to: merge duplicates, remove obsolete, consolidate related. Preserves original metadata.

Score-based Fallback

If the LLM call fails, archives facts with scores below 0.1 to memory_archive.json.

Daily Note Cleanup

Removes notes older than the retention period (default: 30 days). Empty directories are cleaned up.

MEMORY.md Regeneration

Rewrites MEMORY.md from the FactIndex — always up-to-date, never the source of truth.

Automatic Migration

When starting for the first time with the new system, ChatCLI detects if a legacy MEMORY.md exists (without memory_index.json) and migrates automatically:

Each line/bullet is converted to an individual fact
Categories are detected from markdown headers
Tags are extracted by technical keywords
The original file is saved as MEMORY.md.bak

`/memory` Command

Subcommand	Description
`/memory` or `/memory today`	Show today’s notes
`/memory yesterday`	Show yesterday’s notes
`/memory 2026-03-04`	Show notes from a specific date
`/memory week`	Show notes from the last 7 days
`/memory longterm`	Show MEMORY.md content
`/memory list`	List all memory files (includes structured JSONs)
`/memory load <date>`	Load a day’s notes into conversation context
`/memory profile`	Show detected user profile
`/memory profile set <field>=<value>`	Set/update a profile field manually
`/memory remember <fact>`	Explicitly add a long-term fact (accepts a `[category]` prefix)
`/memory forget <substring>`	Remove long-term facts containing the substring
`/memory topics`	Show tracked topics with frequency
`/memory projects`	Show tracked projects with status
`/memory stats`	Full statistics (sessions, peak hours, errors, features)
`/memory facts [category]`	List facts with scores (filter by category)
`/memory compact`	Force immediate compaction (LLM + note cleanup)

Manual editing and extended profile

Automatic detection doesn’t always catch everything, so you can edit memory explicitly. The user profile now covers, beyond name/role/expertise: company, location, skills, certifications and goals. List fields (skills, certifications, goals) accumulate and dedupe.

> /memory profile set company=ACME Corp
> /memory profile set location=São Paulo, BR
> /memory profile set certifications=CKA, AWS SAA   # becomes a deduplicated list
> /memory remember [preference] Prefers Go over Python for CLIs
> /memory forget Python                              # removes facts containing "Python"

`@memory` tool (in agent mode)

Inside /agent and /coder, the model can persist memory on its own via the @memory tool (cmds remember, profile, forget, recall):

<tool_call name="@memory" args='{"cmd":"remember","args":{"content":"User earned the AWS Solutions Architect certification","category":"personal"}}' />

So when you tell the agent something new (e.g. a fresh certification), it records it into your profile/long-term facts without you running /memory manually.

What goes in each component?

FactIndex: Stable, long-lasting facts — decisions, patterns, gotchas, preferences
UserProfile: Who you are — name, role, expertise, language
TopicTracker: What you talk about — Go, Docker, K8s, etc.
ProjectTracker: What you work on — chatcli, my-app, etc.
PatternDetector: How you work — schedules, features, common errors
Daily notes: What happened today — temporal and specific

Can I edit memories manually?

Yes! All files are plain JSON or Markdown:

# View profile
cat ~/.chatcli/memory/user_profile.json | jq .

# View facts with scores
cat ~/.chatcli/memory/memory_index.json | jq '.[0:5]'

# Edit today's note
vim ~/.chatcli/memory/$(date +%Y%m)/$(date +%Y%m%d).md

JSON changes are loaded on next startup. MEMORY.md is regenerated and should not be edited directly.

How does fact scoring work?

Each fact has a score calculated by:

score = (1 + log(1 + accessCount)) * exp(-daysSinceAccess * ln(2) / halfLifeDays)

accessCount: How many times the fact was used by the retriever
daysSinceAccess: Days since last access
halfLifeDays: Decay half-life (default: 30 days)

Frequently and recently accessed facts get high scores. Never-accessed facts decay to ~0 after 3-4 half-lives.

What Gets Injected into the Prompt

The ContextBuilder assembles the following block and injects it as a system prompt prefix:

## AGENTS.md

[content of AGENTS.md]

---

## SOUL.md

[content of SOUL.md]

---

## USER.md

[content of USER.md]

---

## IDENTITY.md

[content of IDENTITY.md]

---

## RULES.md

[content of RULES.md]

---

# Memory

## Long-term Memory

[content of MEMORY.md]

## Recent Daily Notes

### 2026-03-04

[content of March 4th note]

### 2026-03-05

[content of March 5th note]

### 2026-03-06

[content of today's note]

Empty sections (missing files) are automatically omitted — only what exists is injected.

Configuration

Environment Variables
Via Helm Chart

CHATCLI_BOOTSTRAP_ENABLED=true
CHATCLI_BOOTSTRAP_DIR=/path/to/bootstrap/files
CHATCLI_MEMORY_ENABLED=true

Variable	Default	Description
`CHATCLI_BOOTSTRAP_ENABLED`	`true`	Enable/disable bootstrap file loading
`CHATCLI_BOOTSTRAP_DIR`	`~/.chatcli/`	Alternative directory for global bootstrap files. Use this when you want to keep your files (SOUL.md, RULES.md, etc.) in another location, such as a versioned repository or a directory shared across machines
`CHATCLI_MEMORY_ENABLED`	`true`	Enable/disable the persistent memory system

CHATCLI_BOOTSTRAP_DIR only overrides the global directory (~/.chatcli/). Project-level files (detected via .git or .agent markers) still take priority over global ones.

# values.yaml
bootstrap:
  enabled: true
  definitions:
    SOUL.md: |
      You are a DevOps assistant...
    USER.md: |
      The user prefers Go...

memory:
  enabled: true
  # Uses the persistence PVC by default

The Helm chart creates ConfigMaps for bootstrap files and mounts them at /home/chatcli/.chatcli/bootstrap/. Memory uses the sessions PVC for persistence.

Context Injection Optimization (Prompt Caching)

ChatCLI optimizes token costs when contexts are attached using three complementary strategies:

Unified System Prompt with Cache Hints

Contexts attached via /context attach are injected as system prompt, not as user messages. This enables provider-level prompt caching:

Provider	Mechanism	Discount
Anthropic	`cache_control: ephemeral`	~90%
OpenAI	Automatic prompt caching	~50%
Google	Context caching API	Variable

The system prompt block contains:

Bootstrap (SOUL.md, USER.md, etc.)
Memory (MEMORY.md + daily notes)
Attached Contexts (new — previously injected as user messages)
K8s Watcher (if active)

Since the system prompt is identical across turns, the provider caches it and charges tokens at a discount.

Smart Compaction

Injected context messages (/memory load, summarized contexts) are automatically truncated during compaction (Level 1 — trimming). This prevents old reference context from consuming valuable token budget.

Token Visibility

The /context attached command now shows:

Estimated tokens per context
Total tokens per turn
Cache hints per provider
Warnings for oversized contexts

When running /context attach, the feedback includes the estimated cost per turn.

Best Practices

Global SOUL.md, per-project USER.md

Keep your preferred personality globally and technical context per project.

Keep MEMORY.md concise

Keep only stable and confirmed facts — not session-specific ones.

Daily notes for journaling

Use them to record decisions, solved problems, and temporal context.

Don't duplicate CLAUDE.md

If you already use CLAUDE.md or project instructions, avoid duplicating them in bootstrap.

Periodically review your memories and remove outdated ones to keep the context relevant.

Bootstrap and Persistent Memory

Bootstrap Files

Supported Files

Loading Priority

Automatic Workspace Detection

Example Scenarios

Detailed Examples

Where to Place Files

Smart Cache

Persistent Memory

System Architecture

Resilient extraction — nothing is lost silently

Storage Structure

Components

Smart Retrieval

Injection mode: push vs pull

Memory Configuration

How Memories Are Created

Extraction Process

Automatic Compaction

Automatic Migration

`/memory` Command

Manual editing and extended profile

`@memory` tool (in agent mode)

What Gets Injected into the Prompt

Configuration

Context Injection Optimization (Prompt Caching)

Unified System Prompt with Cache Hints

Smart Compaction

Token Visibility

Best Practices

Global SOUL.md, per-project USER.md

Keep MEMORY.md concise

Daily notes for journaling

Don't duplicate CLAUDE.md

Next Steps

Conversation Control

Sessions

​Bootstrap Files

​Supported Files

​Loading Priority

​Automatic Workspace Detection

​Example Scenarios

​Detailed Examples

​Where to Place Files

​Smart Cache

​Persistent Memory

​System Architecture

​Resilient extraction — nothing is lost silently

​Storage Structure

​Components

​Smart Retrieval

​Injection mode: push vs pull

​Memory Configuration

​How Memories Are Created

​Extraction Process

​Automatic Compaction

​Automatic Migration

​/memory Command

​Manual editing and extended profile

​@memory tool (in agent mode)

​What Gets Injected into the Prompt

​Configuration

​Context Injection Optimization (Prompt Caching)

​Unified System Prompt with Cache Hints

​Smart Compaction

​Token Visibility

​Best Practices

Global SOUL.md, per-project USER.md

Keep MEMORY.md concise

Daily notes for journaling

Don't duplicate CLAUDE.md

​Next Steps

Conversation Control

Sessions

Bootstrap Files

Supported Files

Loading Priority

Automatic Workspace Detection

Example Scenarios

Detailed Examples

Where to Place Files

Smart Cache

Persistent Memory

System Architecture

Resilient extraction — nothing is lost silently

Storage Structure

Components

Smart Retrieval

Injection mode: push vs pull

Memory Configuration

How Memories Are Created

Extraction Process

Automatic Compaction

Automatic Migration

`/memory` Command

Manual editing and extended profile

`@memory` tool (in agent mode)

What Gets Injected into the Prompt

Configuration

Context Injection Optimization (Prompt Caching)

Unified System Prompt with Cache Hints

Smart Compaction

Token Visibility

Best Practices

Next Steps