Skip to main content
ChatCLI’s Cost Tracking monitors token consumption and estimates costs in real time during your sessions. You can track how much each conversation is costing and make informed decisions about model usage and history compaction.

The /cost Command

The /cost command displays a comprehensive summary of token consumption and estimated costs for the current session:
/cost
Session Cost Summary
====================

Tokens Used:
  Input:    45.2K tokens
  Output:   12.8K tokens
  Cache:    38.1K tokens (cached)
  Total:    58.0K tokens

Estimated Cost:
  Input:    $0.1356
  Output:   $0.1920
  Cache:    -$0.0914 (savings)
  ─────────────────────
  Total:    $0.2362

Model: claude-sonnet-4-6 (Anthropic)
Requests: 14
Duration: 23m 15s
Costs are estimates based on published provider prices. Actual cost may vary depending on discounts, usage tiers, and prompt caching applied by the provider.

Token Tracking by Mode

ChatCLI tracks tokens across all interaction modes:
In chat mode, tracking counts:
  • System prompt tokens (bootstrap + memory + contexts)
  • Tokens for each user message
  • Tokens for each assistant response
  • Cache savings (when applicable)
[chat] claude-sonnet-4-6> /cost
Chat mode: 12 messages, 23.4K tokens, ~$0.08

Pricing Tables

ChatCLI knows the prices of the most common models for calculating estimates:

Anthropic

ModelInput (per 1M tokens)Output (per 1M tokens)Cache Read
claude-sonnet-4-6$3.00$15.00$0.30
claude-opus-4$15.00$75.00$1.50
claude-haiku-3.5$0.80$4.00$0.08

OpenAI

ModelInput (per 1M tokens)Output (per 1M tokens)Cache Read
gpt-4o$2.50$10.00$1.25
gpt-4o-mini$0.15$0.60$0.075
o3-mini$1.10$4.40$0.55

Google

ModelInput (per 1M tokens)Output (per 1M tokens)
gemini-2.0-flash$0.10$0.40
gemini-2.5-pro$1.25$10.00

ZAI (Zhipu AI)

ModelInput (per 1M tokens)Output (per 1M tokens)
glm-5$1.00$4.00
glm-4.5$1.00$4.00
glm-4.5-flash$1.00$4.00

MiniMax

ModelInput (per 1M tokens)Output (per 1M tokens)
MiniMax-M2.7$0.30$1.20
MiniMax-M2.5$0.30$1.20
Prices are updated periodically in ChatCLI releases. For unlisted models or custom providers (via OpenAI-compatible), the cost shows as “N/A”.

Visual Display

ChatCLI uses visual indicators for easy tracking:

Token Format

Tokens are displayed with K/M suffixes for readability:
ValueDisplay
1,2341.2K
45,67845.7K
1,234,5671.2M

Context Progress Bar

When context approaches the model limit, a progress bar indicates utilization:
Context: ████████░░░░░░░░ 52% (104K / 200K tokens)
When context usage exceeds 80%, ChatCLI automatically suggests running /compact to free up space and reduce costs.

Cache Savings

ChatCLI optimizes costs using prompt caching when the provider supports it:
Cache Savings:
  System prompt cached: 38.1K tokens
  Savings this session: $0.09 (38% reduction)
  Cache hit rate: 92%
Cache savings are calculated based on the difference between the normal input price and the cache read price. See Bootstrap and Memory for details on context optimization.

Next Steps

Conversation Control

Use /compact to reduce tokens and costs.

One-Shot Mode

Monitor costs in automated pipelines.