ChatCLI’s Cost Tracking monitors token consumption and estimates costs in real time during your sessions. You can track how much each conversation is costing and make informed decisions about model usage and history compaction.
The /cost Command
The /cost command displays a comprehensive summary of token consumption and estimated costs for the current session:
Session Cost Summary
====================
Tokens Used:
Input: 45.2K tokens
Output: 12.8K tokens
Cache: 38.1K tokens (cached)
Total: 58.0K tokens
Estimated Cost:
Input: $0.1356
Output: $0.1920
Cache: -$0.0914 (savings)
─────────────────────
Total: $0.2362
Model: claude-sonnet-4-6 (Anthropic)
Requests: 14
Duration: 23m 15s
Costs are estimates based on published provider prices. Actual cost may vary depending on discounts, usage tiers, and prompt caching applied by the provider.
Token Tracking by Mode
ChatCLI tracks tokens across all interaction modes:
Chat Mode
Agent Mode
Coder Mode
In chat mode, tracking counts:
System prompt tokens (bootstrap + memory + contexts)
Tokens for each user message
Tokens for each assistant response
Cache savings (when applicable)
[chat] claude-sonnet-4-6> /cost
Chat mode: 12 messages, 23.4K tokens, ~$0.08
In agent mode, tracking additionally includes:
Tokens for each tool call (request + response)
ReAct loop tokens (reasoning + action + observation)
Dispatched worker tokens (multi-agent)
Web tool tokens (webfetch/websearch)
[agent] claude-sonnet-4-6> /cost
Agent mode: 8 turns, 45.2K tokens, ~$0.24
Tool calls: 23 (read: 12, exec: 5, write: 4, search: 2)
In coder mode, tracking is similar to agent with additional details:
Tokens per edited file
Tokens for applied patches
Tokens for test execution
[coder] claude-sonnet-4-6> /cost
Coder mode: 5 turns, 67.8K tokens, ~$0.35
Files modified: 8
Tests executed: 3
Pricing Tables
ChatCLI knows the prices of the most common models for calculating estimates:
Anthropic
Model Input (per 1M tokens) Output (per 1M tokens) Cache Read claude-sonnet-4-6 $3.00 $15.00 $0.30 claude-opus-4 $15.00 $75.00 $1.50 claude-haiku-3.5 $0.80 $4.00 $0.08
OpenAI
Model Input (per 1M tokens) Output (per 1M tokens) Cache Read gpt-4o $2.50 $10.00 $1.25 gpt-4o-mini $0.15 $0.60 $0.075 o3-mini $1.10 $4.40 $0.55
Google
Model Input (per 1M tokens) Output (per 1M tokens) gemini-2.0-flash $0.10 $0.40 gemini-2.5-pro $1.25 $10.00
ZAI (Zhipu AI)
Model Input (per 1M tokens) Output (per 1M tokens) glm-5 $1.00 $4.00 glm-4.5 $1.00 $4.00 glm-4.5-flash $1.00 $4.00
MiniMax
Model Input (per 1M tokens) Output (per 1M tokens) MiniMax-M2.7 $0.30 $1.20 MiniMax-M2.5 $0.30 $1.20
Prices are updated periodically in ChatCLI releases. For unlisted models or custom providers (via OpenAI-compatible), the cost shows as “N/A”.
Visual Display
ChatCLI uses visual indicators for easy tracking:
Tokens are displayed with K/M suffixes for readability:
Value Display 1,234 1.2K45,678 45.7K1,234,567 1.2M
Context Progress Bar
When context approaches the model limit, a progress bar indicates utilization:
Context: ████████░░░░░░░░ 52% (104K / 200K tokens)
When context usage exceeds 80%, ChatCLI automatically suggests running /compact to free up space and reduce costs.
Cache Savings
ChatCLI optimizes costs using prompt caching when the provider supports it:
Cache Savings:
System prompt cached: 38.1K tokens
Savings this session: $0.09 (38% reduction)
Cache hit rate: 92%
Cache savings are calculated based on the difference between the normal input price and the cache read price. See Bootstrap and Memory for details on context optimization.
Next Steps
Conversation Control Use /compact to reduce tokens and costs.
One-Shot Mode Monitor costs in automated pipelines.