Skip to main content
ChatCLI is a modular Go project organized into packages with well-defined responsibilities. Following a Phase 3 monolith decomposition effort, all major files have been broken down into focused, single-responsibility modules following SOLID principles. This page documents the internal architecture for contributors and advanced users.

Package Overview

chatcli/
|-- main.go                    # Entrypoint
|-- cli/                       # User interface and modes
|   |-- cli.go                 # Main ChatCLI struct + Start() (~923 lines)
|   |-- cli_helpers.go         # Helper methods extracted from cli.go
|   |-- cli_history.go         # History management methods
|   |-- cli_llm.go             # processLLMRequest + spinner
|   |-- cli_mode.go            # Mode switching logic
|   |-- cli_output.go          # Output formatting methods
|   |-- cli_prompt.go          # Prompt building methods
|   |-- cli_session.go         # Session management methods
|   |-- cli_tools.go           # Tool registration and handling
|   |-- agent_mode.go          # Agent mode core (~1498 lines)
|   |-- agent_mode_helpers.go  # Agent mode helper methods
|   |-- agent_mode_tools.go    # Agent tool handling
|   |-- agent_mode_ui.go       # Agent UI rendering
|   |-- agent_tool_sanitizer.go # Argument sanitization pipeline
|   |-- command_handler.go     # Slash command routing (~185 lines)
|   |-- command_handler_session.go  # Session commands
|   |-- command_handler_context.go  # Context commands
|   |-- command_handler_plugins.go  # Plugin commands
|   |-- command_handler_agent.go    # Agent commands
|   |-- context_handler.go     # Context management (~385 lines)
|   |-- context_handler_attach.go   # Context attach logic
|   |-- context_handler_create.go   # Context creation logic
|   |-- context_handler_resolve.go  # Context resolution
|   |-- rewind.go              # Conversation checkpoints and restore
|   |-- compact_command.go     # Guided history compaction
|   |-- history_compactor.go   # 3-level compaction pipeline
|   |-- history_trimmer.go     # Near-lossless trimming (level 1)
|   |-- memory_worker.go       # Background memory worker
|   |-- agent/                 # Multi-agent system
|   |   |-- workers/           # 12 specialist agents
|   |   |-- toolcall_parser.go # Stateful tool call parser
|   |   |-- ui_renderer.go     # UI rendering
|   |   +-- dispatcher.go      # Async dispatcher
|   |-- bus/                   # Internal message bus
|   |-- workspace/             # Bootstrap files + memory + context builder
|   |   |-- bootstrap.go       # BootstrapLoader with mtime cache
|   |   |-- memory.go          # MemoryStore facade (delegates to memory.Manager)
|   |   |-- context_builder.go # Combines bootstrap + memory into prompt (smart retrieval)
|   |   +-- memory/            # Structured memory system
|   |       |-- store.go       # Manager (central orchestrator)
|   |       |-- facts.go       # FactIndex (scored facts with decay)
|   |       |-- profile.go     # UserProfile (automatic profiling)
|   |       |-- topics.go      # TopicTracker (recurring topics)
|   |       |-- projects.go    # ProjectTracker (active projects)
|   |       |-- patterns.go    # PatternDetector (usage patterns)
|   |       |-- retriever.go   # Smart keyword-based retrieval
|   |       |-- compactor.go   # LLM + score-based compaction
|   |       +-- migration.go   # Legacy MEMORY.md migration
|   |-- skills/                # Skills system
|   |-- mcp/                   # MCP manager
|   +-- ctxmgr/                # Context manager package
|-- config/                    # Configuration and migration
|-- i18n/                      # Internationalization (892 translation keys)
|-- llm/                       # LLM communication
|   |-- registry/              # Provider auto-registration
|   |-- fallback/              # Fallback chain
|   |-- client/                # LLMClient + ToolAwareClient interface
|   |-- openai/                # OpenAI provider
|   |-- claudeai/              # Anthropic provider
|   |-- googleai/              # Google provider
|   |-- xai/                   # xAI provider
|   |-- zai/                   # ZAI (Zhipu AI) provider
|   |-- minimax/               # MiniMax provider
|   |-- moonshot/              # Moonshot (Kimi) provider
|   |-- ollama/                # Ollama provider
|   +-- copilot/               # GitHub Copilot provider
|-- models/                    # Structs: ToolDefinition, ToolCall, LLMResponse
|-- server/                    # gRPC server
|   |-- handler.go             # Main handler (~420 lines)
|   |-- handler_session.go     # Session RPCs
|   |-- handler_resources.go   # Resource discovery RPCs
|   +-- handler_aiops.go       # AIOps RPCs
|-- client/remote/             # gRPC client
|-- k8s/                       # Kubernetes Watcher
|-- operator/                  # K8s Operator (AIOps)
|   |-- api/v1alpha1/          # 17 CRD types
|   +-- controllers/           # Reconcilers and engines
|-- proto/                     # Protobuf definitions
|-- utils/                     # Utility functions
+-- version/                   # Version information

Main Struct: ChatCLI

The ChatCLI struct in cli/cli.go is the heart of the system. Its most important fields are organized by responsibility:

LLM and Provider

FieldTypeDescription
Clientclient.LLMClientActive LLM provider client
managermanager.LLMManagerProvider manager (creates clients, lists available)
ProviderstringActive provider name (e.g., "ANTHROPIC", "OPENAI")
ModelstringActive model (e.g., "claude-sonnet-4-20250514")

History and Compaction

FieldTypeDescription
history[]models.MessageUnified conversation history (shared across all modes)
historyCompactor*HistoryCompactor3-level compaction pipeline
historyManager*HistoryManagerCommand history persistence to disk

Execution Control

FieldTypeDescription
isExecutingatomic.BoolAtomic flag: true while waiting for LLM response
operationCancelcontext.CancelFuncCancels the in-flight LLM operation (Ctrl+C)
processingDonechan struct{}Signals processing completion to the main loop
interactionStateInteractionStateInteraction state: Normal, SwitchingProvider, Processing, AgentMode
executionProfileExecutionProfileExecution profile: Normal, Agent, Coder

Subsystems

FieldTypeDescription
agentMode*AgentModeAgent mode instance (ReAct loop)
pluginManager*plugins.ManagerPlugin manager (built-in + external)
commandHandler*CommandHandlerSlash command router
contextHandler*ContextHandlerContext manager (/context)
personaHandler*PersonaHandlerPersona/agent manager
skillHandler*SkillHandlerSkill registry and execution
contextBuilder*workspace.ContextBuilderCombines bootstrap + memory into system prompt
memoryStore*workspace.MemoryStoreFacade for the structured memory system
memWorker*memoryWorkerBackground memory extraction worker

Auxiliary State

FieldTypeDescription
checkpoints[]conversationCheckpointHistory restore points (max 20)
messageQueue[]stringFIFO queue of messages typed during processing (type-ahead)
lastEscTimetime.TimeEsc+Esc detection for rewind menu
sessionManager*SessionManagerSave/load sessions to disk

Startup Flow

The diagram below shows the complete boot sequence, from main.go to the interactive loop:
main.go
  |
  +-- Check subcommands (server, connect, watch)
  +-- cli.PreprocessArgs() + cli.Parse() -> opts
  +-- i18n.Init()
  +-- godotenv.Load(.env)
  +-- utils.InitializeLogger() -> zap.Logger
  +-- config.InitGlobal() + config.Global.Load()
  |
  +-- manager.NewLLMManager(logger)
  |     +-- Providers auto-register via init() in registry
  |     +-- Manager iterates providers, checks env vars, creates clients
  |
  +-- cli.NewChatCLI(manager, logger)
  |     +-- plugins.NewManager() -> registers built-in (CoderPlugin)
  |     +-- configureProviderAndModel() (config.Global + env)
  |     +-- manager.GetClient(provider, model) -> LLMClient
  |     +-- NewSessionManager()
  |     +-- NewContextHandler()
  |     +-- detectProjectDir(): walks up looking for .agent or .git
  |     +-- workspace.NewBootstrapLoader(workspaceDir, globalDir)
  |     +-- workspace.NewMemoryStore(memDir)
  |     +-- workspace.NewContextBuilder(bootstrap, memory)
  |     +-- memoryWorker.start() (background goroutine)
  |     +-- NewPersonaHandler() + SetProjectDir()
  |     +-- NewSkillHandler()
  |     +-- NewCommandHandler(cli)
  |     +-- NewAgentMode(cli, logger)
  |     +-- historyManager.LoadHistory() (command history)
  |
  +-- chatCLI.ApplyOverrides(provider, model) (CLI flags)
  +-- HandleOneShotOrFatal(ctx, opts) -> returns if -p flag
  +-- handleGracefulShutdown(cancel, logger)
  +-- chatCLI.Start(ctx)
        +-- PrintWelcomeScreen()
        +-- Loop: prompt.New() with panic/recover for mode switching

detectProjectDir()

Walks up from the current working directory to the filesystem root looking for project markers:
  1. .agent/ (explicit ChatCLI marker) — takes priority
  2. .git/ (common convention)
Returns the project root path, or "" if no marker is found.

Message Flow: Chat Mode

In normal interactive mode, each user message follows this pipeline:
User types text
       |
       v
  executor(in string)
       |
       +-- Paste detection (BracketedPasteParser)
       +-- Append to commandHistory
       +-- Check / commands (routing via CommandHandler)
       +-- Check /run, /agent, /coder -> panic(agentModeRequest)
       |
       v
  processLLMRequest(in)
       |
       +-- Suppress animation (go-prompt manages spinner in prefix)
       +-- Start spinner goroutine (250ms tick + SIGWINCH)
       +-- saveCheckpoint() (deep copy of history)
       +-- Append user message to history
       +-- BuildPromptMessages() (system prompt + attached contexts)
       +-- Client.SendPrompt(ctx, prompt, history, maxTokens)
       |
       v
  LLM Response
       |
       +-- Append response to history
       +-- Render via Glamour (markdown)
       +-- memWorker.nudge() (trigger memory extraction)
       +-- Drain messageQueue (type-ahead)

Type-ahead (messageQueue)

Messages typed while the LLM is processing are stored in a FIFO queue (messageQueue). After each response, the system drains the queue and processes each message sequentially, eliminating the need to re-type. Coder mode has its own queue with a real-time visual indicator (▼ N msg queue) when messages are pending.

Input guard (anti-typeahead in security prompts)

Distinct from the “good” typeahead above, when a security box appears mid-turn — three layers discard queued input to prevent consuming bytes as y/n:
  • TTY flushTCIFLUSH (Linux), TIOCFLUSH (BSD/Darwin), FlushConsoleInputBuffer (Windows).
  • Drain channel — empties the non-blocking reader buffer.
  • 250ms debounce — discards input arriving in the first 250ms after the box renders.
Additionally, at the start of every agent turn, stty sane runs on the controlling /dev/tty to recover from a prior go-prompt teardown that may have left the terminal in raw mode.

Message Flow: Agent/Coder Mode

Entering agent mode uses a panic/recover mechanism to exit the go-prompt loop:
User types "/agent QUERY" or "/coder QUERY" or "/run QUERY"
       |
       v
  executor() -> panic(agentModeRequest)
       |
       v
  Start() recover -> detects agentModeRequest
       +-- restoreTerminal() (in coder mode)
       |
       v
  agentMode.Run(ctx, query, additionalContext, systemPromptOverride)
       |
       v
  === SETUP ===
  1. saveCheckpoint()
  2. System prompt composition:
     |-- If active persona: Persona + FormatInstructions (Coder or Agent)
     |-- If none: Default prompt (CoderSystemPrompt or Agent default)
  3. Prepend workspace context (SOUL.md, USER.md, IDENTITY.md, RULES.md, MEMORY.md)
     |-- Smart retrieval: extracts keywords from last 3 messages
  4. Append tool context (description of available plugins)
  5. Append orchestrator prompt (if multi-agent enabled)
       |
       v
  === ReAct LOOP ===
  For each turn (up to maxTurns):
       |
       +-- Build turn history with anchor reminder
       +-- Check token budget -> compact if > 60%
       +-- Client.SendPrompt(ctx, systemInstruction, turnHistory, 0)
       |
       v
  Parse response (reasoning, explanation, tool_calls, agent_calls, commands)
       |
       +-- Priority 0: agent_calls -> dispatch to parallel workers
       +-- Priority 1: tool_calls -> sanitize args -> security check -> execute plugin -> accumulate output
       +-- Priority 2: command blocks (legacy format)
       +-- Priority 3: final response (no actions = loop ends)
       |
       v
  Inject batch results into history -> next turn

Anchor Reminder

At each turn, the agent injects a short reminder into the history to keep the LLM focused on the original task. This prevents drift in long conversations.

Tool Call Parsing

The parser in cli/agent/toolcall_parser.go uses a stateful scanner (not regex) for robustness:

Supported Formats

<!-- XML self-closing -->
<tool_call name="@coder" args="--file main.go --content '...'" />

<!-- XML paired -->
<tool_call name="@coder" args="..."></tool_call>

<!-- JSON fallback -->
{"tool_call":"@coder","args":{"file":"main.go"}}

Scanner Algorithm

  1. Search for <tool_call case-insensitively
  2. Verify the next char is whitespace or > (not part of another tag like <tool_caller>)
  3. scanTagEnd(): advance forward respecting quotes (single and double)
    • Inside quotes, > is treated as literal text
    • Supports HTML entities (&gt;, &quot;, etc.)
  4. Extract name and args attributes regardless of order
  5. If self-closing fails, try paired tags with </tool_call>
  6. Fallback to JSON: try parseJSONToolCalls() in parallel

Why Not Regex?

Tool arguments frequently contain >, ", and nested JSON. Regex cannot distinguish > inside a quoted attribute from > that closes the tag. The stateful scanner solves this by tracking quote state.

Argument Sanitization Pipeline

After parsing, each tool call goes through a 7-step pipeline in agent_tool_sanitizer.go:
Raw args from LLM
       |
  1. HTML unescape (&quot; -> ", &#10; -> \n, &gt; -> >)
  2. Line normalization (CRLF -> LF)
  3. Line continuation handling (backslash + newline -> space)
  3b. JSON-ish string unescape (if detected)
  4. Bogus backslash+space removal (outside quotes)
  4b. Fix unbalanced quotes with trailing backslash
  5. Remove trailing dangling backslashes
  6. Normalize multiple spaces (preserve inside quotes)
  7. Coder-specific semantic fixes (if applicable)
       |
       v
  Format detection:
  +-- If valid JSON -> buildArgvFromJSONMap()
  +-- If CLI-style -> splitToolArgsMultiline()
       |
       v
  Clean args ready for execution

History Compaction (3 Levels)

The HistoryCompactor in cli/history_compactor.go manages history size through a progressive pipeline:
History exceeds token budget
       |
       v
  LEVEL 1: Trimming (near-lossless)
  |-- Remove injected context messages (>3000 chars)
  |-- Remove <reasoning> blocks from assistant
  |-- Compact verbose XML in tool outputs
  |-- Deduplicate identical consecutive messages
  |-- Truncate overly long tool outputs
  |
  +-- If within budget -> return
       |
       v
  LEVEL 2: Structured summarization
  |-- Split history into: [system | middle (summarize) | recent (keep)]
  |-- Preserve the 8 most recent messages verbatim
  |-- Send middle block to LLM with fact-extraction prompt
  |-- Result: single "summary" message replacing the block
  |-- Independent timeout (10 min) to avoid blocking the flow
  |
  +-- If within budget -> return
       |
       v
  LEVEL 3: Emergency truncation
  |-- Last resort: drops middle messages without summarization
  |-- Keeps system messages + recent messages
  |-- Logs warning for visibility

Agent Mode Trigger

In agent mode, compaction is checked at each turn of the ReAct loop. The trigger fires when token usage exceeds 60% of the model’s budget.

Checkpoint/Rewind System

rewind.go implements a history snapshot system:

Structure

type conversationCheckpoint struct {
    Timestamp time.Time
    Label     string           // auto-generated summary of last user message
    History   []models.Message // deep copy of history at checkpoint time
    MsgCount  int              // number of messages at checkpoint
}

Behavior

  • When saved: Before each LLM call (saveCheckpoint())
  • Limit: Maximum of 20 checkpoints (FIFO — oldest are discarded)
  • Deep copy: Each checkpoint contains a complete, independent copy of the history
  • Trigger: Press Esc+Esc (two Esc presses within 500ms) to open the rewind menu
  • Restore: The user selects a checkpoint, and the history is replaced with the saved copy

Provider Registry

Auto-registration Pattern

Each LLM provider auto-registers via init() in the llm/registry package:
llm/openai/register.go      -> init() { registry.Register(ProviderInfo{...}) }
llm/claudeai/register.go    -> init() { registry.Register(ProviderInfo{...}) }
llm/googleai/register.go    -> init() { registry.Register(ProviderInfo{...}) }
llm/xai/register.go         -> init() { registry.Register(ProviderInfo{...}) }
llm/zai/register.go         -> init() { registry.Register(ProviderInfo{...}) }
llm/minimax/register.go     -> init() { registry.Register(ProviderInfo{...}) }
llm/ollama/register.go      -> init() { registry.Register(ProviderInfo{...}) }
llm/copilot/register.go     -> init() { registry.Register(ProviderInfo{...}) }

ProviderInfo

type ProviderInfo struct {
    Name         string           // unique key (e.g., "OPENAI")
    DisplayName  string           // display name (e.g., "OpenAI")
    Factory      ProviderFactory  // function that creates LLMClient from ProviderConfig
    EnvKeys      []string         // environment variables for API key discovery
    RequiresAuth bool             // whether OAuth authentication is needed
}

Discovery Flow

manager.NewLLMManager(logger)
       |
       +-- registry.List() -> names of all registered providers
       +-- For each provider:
       |     +-- Check EnvKeys (os.Getenv)
       |     +-- If key found or RequiresAuth: mark as available
       +-- GetClient(provider, model):
             +-- registry.Get(provider) -> ProviderInfo
             +-- info.Factory(ProviderConfig{...}) -> LLMClient
This pattern eliminates switch/case blocks. To add a new provider, just create a package with register.go and implement the LLMClient interface.

Memory Worker (Background Process)

The memoryWorker in cli/memory_worker.go extracts memories from the conversation in the background:

Parameters

ConstantValueDescription
memoryMinNewMessages4Minimum new messages to trigger
memoryCooldown2 minMinimum time between extractions
memoryExtractTimeout60sTimeout for the LLM extraction call
compactionCheckInterval6hHow often to check for compaction
dailyCleanupInterval24hHow often to clean up daily notes

Triggers

  1. nudge(): Called after each LLM response. If there are >= 4 new messages and cooldown has expired, runs extraction
  2. 3-minute ticker: The background loop checks periodically (for long sessions where the user types infrequently)
  3. Compaction ticker (6h): Consolidates old facts with low scores
  4. Cleanup ticker (24h): Removes expired daily notes

Extraction Pipeline

nudge() or ticker
       |
       v
  maybeExtract()
       +-- Check: >= 4 new messages && cooldown expired
       +-- Send recent conversation to LLM with extraction prompt
       +-- LLM returns structured JSON with categories:
       |     - DAILY: daily notes (ephemeral)
       |     - LONGTERM: permanent facts
       |     - PROFILE_UPDATE: user profile updates
       |     - TOPICS: recurring topics
       |     - PROJECTS: active projects
       |
       +-- Results stored in structured JSON files
       +-- PatternDetector records usage patterns

Main Components

CLI and Modes

The ChatCLI struct in cli/cli.go is the central point (~923 lines after decomposition). The Start() method initiates interactive mode using Bubble Tea (Charmbracelet). Helper methods, history management, mode switching, output formatting, prompt building, session management, and tool handling have been extracted into dedicated files (cli_helpers.go, cli_history.go, cli_mode.go, cli_output.go, cli_prompt.go, cli_session.go, cli_tools.go).
ModeFileDescription
Interactivecli/cli.goInteractive prompt with auto-completion
Agentcli/agent_mode.goTask planning and execution
Codercli/cli.go + cli/agent_mode.goEngineering loop with tool calls
One-shotcli/cli.go (flag -p)Single execution without TUI
Mode switching uses a panic/recover mechanism to exit the go-prompt loop. ChatCLI uses a unified history (cli.history) shared across all modes. When switching modes, the full context is preserved. The /compact and /rewind commands operate directly on this single history.

Message Bus

The cli/bus package implements a typed message bus with:
  • Pub/sub with channel and type filters
  • Request-reply with correlation IDs
  • Atomic throughput metrics

Multi-Agent

The orchestration system in cli/agent/workers manages 12 specialist agents running in parallel goroutines with a configurable semaphore. Each worker has:
  • Isolated mini ReAct loop (observe -> reason -> act)
  • Own skills (accelerator and descriptive scripts)
  • File locks (mutex per-filepath)
  • Configurable timeout and max turns

Technologies

CategoryLibrary
TUIBubble Tea (Charmbracelet)
MarkdownGlamour (Charmbracelet)
ColorsLipgloss (Charmbracelet)
LoggerZap (Uber)
Log rotationLumberjack
Env filesGodotenv
i18ngolang.org/x/text
gRPCgoogle.golang.org/grpc
Kubernetesk8s.io/client-go
Operatorcontroller-runtime (Kubebuilder)
Protobufgoogle.golang.org/protobuf

Design Patterns

  • Auto-registration via init() — Providers register themselves automatically
  • Interface-drivenLLMClient and ToolAwareClient for polymorphism
  • Fallback chain — Intelligent error classification + exponential cooldown
  • Stateful parser — XML parsing with escaped attributes (more robust than regex)
  • embed.FS — i18n files embedded in the binary (always uses /, never filepath.Join)
  • Panic/recover — Mode switching in go-prompt without restarting the process
  • Unified history — A single message array shared across chat, agent, and coder modes
  • Checkpoint/rewind — Deep copy of history before each LLM call, with selective restore
  • 3-level compaction — Near-lossless trimming, structured summarization, emergency truncation
  • Type-ahead queue — Messages typed during processing are queued and drained automatically
  • Workspace context injection — Bootstrap + memory automatically injected into every system prompt
  • Unified system prompt with caching — Attached contexts (/context attach) flow through the system prompt via SystemParts with CacheControl hints, enabling provider-level prompt caching (Anthropic cache_control: ephemeral, OpenAI automatic caching, Google context caching)
  • Background memory extraction — Worker goroutine extracts and consolidates memories from the conversation without blocking the main flow