Skip to main content
ChatCLI’s Cost Tracking monitors token consumption and estimates costs in real time during your sessions, with real API usage data when available. You can track how much each conversation is costing, view costs by model/provider, and configure spending limits.

The /cost Command

The /cost command displays a comprehensive summary of token consumption and estimated costs for the current session:
/cost
Session Cost Summary
====================

Tokens Used:
  Input:    45.2K tokens
  Output:   12.8K tokens
  Cache:    38.1K tokens (cached)
  Total:    58.0K tokens

Estimated Cost:
  Input:    $0.1356
  Output:   $0.1920
  Cache:    -$0.0914 (savings)
  ─────────────────────
  Total:    $0.2362

Model: claude-sonnet-4-6 (Anthropic)
Requests: 14
Duration: 23m 15s
When the provider returns real token usage data (Usage.IsReal = true), costs are calculated with precision. For providers that do not return real usage, ChatCLI estimates tokens from text size.

Token Tracking by Mode

ChatCLI tracks tokens across all interaction modes:
In chat mode, tracking counts:
  • System prompt tokens (bootstrap + memory + contexts)
  • Tokens for each user message
  • Tokens for each assistant response
  • Cache savings (when applicable)
[chat] claude-sonnet-4-6> /cost
Chat mode: 12 messages, 23.4K tokens, ~$0.08

Pricing Tables

ChatCLI knows the prices of the most common models for calculating estimates:

Anthropic

ModelInput (per 1M tokens)Output (per 1M tokens)Cache Read
claude-opus-4-8$5.00$25.00$0.50
claude-opus-4-8 (ANTHROPIC_SPEED=fast)$10.00$50.00$1.00
claude-opus-4-7$5.00$25.00$0.50
claude-sonnet-4-6$3.00$15.00$0.30
claude-opus-4$15.00$75.00$1.50
claude-haiku-3.5$0.80$4.00$0.08

OpenAI

ModelInput (per 1M tokens)Output (per 1M tokens)Cache Read
gpt-4o$2.50$10.00$1.25
gpt-4o-mini$0.15$0.60$0.075
o3-mini$1.10$4.40$0.55
GPT models surface usage in the chat envelope too, with the same N↑ M↓ arrows the Claude flow has used since launch. ChatCLI sends stream_options: {include_usage: true} on streaming Chat Completions and parses response.completed on the Responses API, so input/output (and cache-hit) counts appear on the envelope regardless of provider. Cached prompt tokens reported under prompt_tokens_details.cached_tokens (Chat Completions) / input_tokens_details.cached_tokens (Responses) map to CacheReadInputTokens, the same field Anthropic prompt caching feeds. Reasoning tokens (o-series / GPT-5) are surfaced under a separate ReasoningTokens informational field — they’re already counted in CompletionTokens and billed as output.

Google

ModelInput (per 1M tokens)Output (per 1M tokens)
gemini-2.0-flash$0.10$0.40
gemini-2.5-pro$1.25$10.00

ZAI (Zhipu AI)

ModelInput (per 1M tokens)Output (per 1M tokens)
glm-5$1.00$4.00
glm-4.5$1.00$4.00
glm-4.5-flash$1.00$4.00

MiniMax

ModelInput (per 1M tokens)Output (per 1M tokens)
MiniMax-M2.7$0.30$1.20
MiniMax-M2.5$0.30$1.20
MiniMax-Text-01$0.30$1.20

Moonshot (Kimi)

ModelInput (per 1M tokens)Output (per 1M tokens)Cache Read
kimi-k2.6$0.95$4.00$0.16
kimi-k2.5$0.95$4.00$0.16
moonshot-v1-*$0.95$4.00$0.16
The ChatCLI cost tracker bills the cache-miss price (0.95/Minput)tostayconservative.Thecachehit(0.95/M input) to stay conservative. The cache hit (0.16/M) is an automatic API saving when the same prefix is reused within the provider’s window — not tracked at runtime.

DeepSeek

ModelInput (per 1M tokens)Output (per 1M tokens)Cache Read
deepseek-chat$0.27$1.10$0.07
deepseek-reasoner$0.55$2.19$0.14

OpenRouter

ModelInput (per 1M tokens)Output (per 1M tokens)
openai/gpt-4o$2.50$10.00
openai/gpt-4o-mini$0.15$0.60
anthropic/claude-sonnet-4$3.00$15.00
google/gemini-2.5-flash$0.15$0.60
deepseek/deepseek-r1$0.55$2.19
Prices are updated periodically in ChatCLI releases. For unlisted models or custom providers (via OpenAI-compatible), the cost shows as “N/A”. OpenRouter provides prices via API — ChatCLI uses its pricing ConfigMap to estimate costs for the most popular models.

Visual Display

ChatCLI uses visual indicators for easy tracking:

Token Format

Tokens are displayed with K/M suffixes for readability:
ValueDisplay
1,2341.2K
45,67845.7K
1,234,5671.2M

Context Progress Bar

When context approaches the model limit, a progress bar indicates utilization:
Context: ████████░░░░░░░░ 52% (104K / 200K tokens)
When context usage exceeds 80%, ChatCLI automatically suggests running /compact to free up space and reduce costs.

Cache Savings

ChatCLI optimizes costs using prompt caching when the provider supports it:
Cache Savings:
  System prompt cached: 38.1K tokens
  Savings this session: $0.09 (38% reduction)
  Cache hit rate: 92%

Cache Tokens (Anthropic)

ChatCLI tracks Anthropic-specific cache tokens:
MetricDescription
CacheCreationTokensTokens used to create cache entries
CacheReadTokensTokens read from cache (reduced cost)
The cache read cost is typically 10% of the normal input cost, resulting in significant savings in long conversations with large system prompts.
Cache savings are calculated based on the difference between the normal input price and the cache read price. See Bootstrap and Memory for details on context optimization.

Real API Data

The CostTracker supports two data sources:
SourcePrecisionWhen Used
Real API dataHighProvider returns Usage in response
Character-based estimateApproximateProvider does not return usage data
When the provider returns real data (HasRealData = true), tracking uses exact token counts. This is supported by 13+ providers including Anthropic, OpenAI, ZAI, MiniMax, DeepSeek, and OpenRouter.

Per-Model Cost

In sessions with multiple models (e.g., fallback chain), /cost shows a breakdown by model:
Per-Model Breakdown:
  anthropic:claude-sonnet-4-6    32 reqs   $0.1845  (real data)
  openai:gpt-4o-mini              5 reqs   $0.0023  (real data)

Session Persistence

Cost data is persisted to disk so it can be consulted after the session ends:
~/.chatcli/sessions/<session_id>/cost.json
The file contains the complete snapshot (SessionCostData) with per-model usage, costs, and timestamps.

Session Budget

Configure a spending limit per session to avoid unexpected costs:
Environment VariableDescriptionDefault
CHATCLI_SESSION_BUDGET_USDMaximum spending limit per session in USD0 (no limit)
CHATCLI_BUDGET_WARNING_PCTPercentage threshold for budget proximity warning0.80 (80%)

Budget Levels

LevelConditionBehavior
BudgetOKSpending below 80% of the limitNormal
BudgetWarningSpending between 80-100% of the limitWarning displayed
BudgetExceededSpending above the limitSession may be limited
# Example: limit session to $5.00
export CHATCLI_SESSION_BUDGET_USD=5.00

# Warn when reaching 70% of the limit
export CHATCLI_BUDGET_WARNING_PCT=0.70
When the budget is exceeded, ChatCLI displays a warning but does not automatically terminate the session. The user can decide to continue or end it.

Next Steps

Conversation Control

Use /compact to reduce tokens and costs.

One-Shot Mode

Monitor costs in automated pipelines.

Tool Results

Tool result budgets that impact token usage.

Context Recovery

Automatic strategies when context overflows.