Cost Tracking - ChatCLI

ChatCLI’s Cost Tracking monitors token consumption and estimates costs in real time during your sessions, with real API usage data when available. You can track how much each conversation is costing, view costs by model/provider, and configure spending limits.

The /cost Command

The /cost command displays a comprehensive summary of token consumption and estimated costs for the current session:

/cost

Session Cost Summary
====================

Tokens Used:
  Input:    45.2K tokens
  Output:   12.8K tokens
  Cache:    38.1K tokens (cached)
  Total:    58.0K tokens

Estimated Cost:
  Input:    $0.1356
  Output:   $0.1920
  Cache:    -$0.0914 (savings)
  ─────────────────────
  Total:    $0.2362

Model: claude-sonnet-4-6 (Anthropic)
Requests: 14
Duration: 23m 15s

When the provider returns real token usage data (Usage.IsReal = true), costs are calculated with precision. For providers that do not return real usage, ChatCLI estimates tokens from text size.

Token Tracking by Mode

ChatCLI tracks tokens across all interaction modes:

Chat Mode
Agent Mode
Coder Mode

In chat mode, tracking counts:

System prompt tokens (bootstrap + memory + contexts)
Tokens for each user message
Tokens for each assistant response
Cache savings (when applicable)

[chat] claude-sonnet-4-6> /cost
Chat mode: 12 messages, 23.4K tokens, ~$0.08

In agent mode, tracking additionally includes:

Tokens for each tool call (request + response)
ReAct loop tokens (reasoning + action + observation)
Dispatched worker tokens (multi-agent)
Web tool tokens (webfetch/websearch)

[agent] claude-sonnet-4-6> /cost
Agent mode: 8 turns, 45.2K tokens, ~$0.24
  Tool calls: 23 (read: 12, exec: 5, write: 4, search: 2)

In coder mode, tracking is similar to agent with additional details:

Tokens per edited file
Tokens for applied patches
Tokens for test execution

[coder] claude-sonnet-4-6> /cost
Coder mode: 5 turns, 67.8K tokens, ~$0.35
  Files modified: 8
  Tests executed: 3

Pricing Tables

ChatCLI knows the prices of the most common models for calculating estimates:

Anthropic

Model	Input (per 1M tokens)	Output (per 1M tokens)	Cache Read
claude-opus-4-8	$5.00	$25.00	$0.50
claude-opus-4-8 (`ANTHROPIC_SPEED=fast`)	$10.00	$50.00	$1.00
claude-opus-4-7	$5.00	$25.00	$0.50
claude-sonnet-4-6	$3.00	$15.00	$0.30
claude-opus-4	$15.00	$75.00	$1.50
claude-haiku-3.5	$0.80	$4.00	$0.08

OpenAI

Model	Input (per 1M tokens)	Output (per 1M tokens)	Cache Read
gpt-4o	$2.50	$10.00	$1.25
gpt-4o-mini	$0.15	$0.60	$0.075
o3-mini	$1.10	$4.40	$0.55

GPT models surface usage in the chat envelope too, with the same N↑ M↓ arrows the Claude flow has used since launch. ChatCLI sends stream_options: {include_usage: true} on streaming Chat Completions and parses response.completed on the Responses API, so input/output (and cache-hit) counts appear on the envelope regardless of provider. Cached prompt tokens reported under prompt_tokens_details.cached_tokens (Chat Completions) / input_tokens_details.cached_tokens (Responses) map to CacheReadInputTokens, the same field Anthropic prompt caching feeds. Reasoning tokens (o-series / GPT-5) are surfaced under a separate ReasoningTokens informational field — they’re already counted in CompletionTokens and billed as output.

Google

Model	Input (per 1M tokens)	Output (per 1M tokens)
gemini-2.0-flash	$0.10	$0.40
gemini-2.5-pro	$1.25	$10.00

ZAI (Zhipu AI)

Model	Input (per 1M tokens)	Output (per 1M tokens)
glm-5	$1.00	$4.00
glm-4.5	$1.00	$4.00
glm-4.5-flash	$1.00	$4.00

MiniMax

Model	Input (per 1M tokens)	Output (per 1M tokens)
MiniMax-M2.7	$0.30	$1.20
MiniMax-M2.5	$0.30	$1.20
MiniMax-Text-01	$0.30	$1.20

Moonshot (Kimi)

Model	Input (per 1M tokens)	Output (per 1M tokens)	Cache Read
kimi-k2.6	$0.95	$4.00	$0.16
kimi-k2.5	$0.95	$4.00	$0.16
moonshot-v1-*	$0.95	$4.00	$0.16

The ChatCLI cost tracker bills the cache-miss price (

0.95/M input) to stay conservative. The cache hit (

0.16/M) is an automatic API saving when the same prefix is reused within the provider’s window — not tracked at runtime.

DeepSeek

Model	Input (per 1M tokens)	Output (per 1M tokens)	Cache Read
deepseek-chat	$0.27	$1.10	$0.07
deepseek-reasoner	$0.55	$2.19	$0.14

OpenRouter

Model	Input (per 1M tokens)	Output (per 1M tokens)
openai/gpt-4o	$2.50	$10.00
openai/gpt-4o-mini	$0.15	$0.60
anthropic/claude-sonnet-4	$3.00	$15.00
google/gemini-2.5-flash	$0.15	$0.60
deepseek/deepseek-r1	$0.55	$2.19

Prices are updated periodically in ChatCLI releases. For unlisted models or custom providers (via OpenAI-compatible), the cost shows as “N/A”. OpenRouter provides prices via API — ChatCLI uses its pricing ConfigMap to estimate costs for the most popular models.

Visual Display

ChatCLI uses visual indicators for easy tracking:

Token Format

Tokens are displayed with K/M suffixes for readability:

Value	Display
1,234	`1.2K`
45,678	`45.7K`
1,234,567	`1.2M`

Context Progress Bar

When context approaches the model limit, a progress bar indicates utilization:

Context: ████████░░░░░░░░ 52% (104K / 200K tokens)

When context usage exceeds 80%, ChatCLI automatically suggests running /compact to free up space and reduce costs.

Cache Savings

ChatCLI optimizes costs using prompt caching when the provider supports it:

Cache Savings:
  System prompt cached: 38.1K tokens
  Savings this session: $0.09 (38% reduction)
  Cache hit rate: 92%

Cache Tokens (Anthropic)

ChatCLI tracks Anthropic-specific cache tokens:

Metric	Description
`CacheCreationTokens`	Tokens used to create cache entries
`CacheReadTokens`	Tokens read from cache (reduced cost)

The cache read cost is typically 10% of the normal input cost, resulting in significant savings in long conversations with large system prompts.

Cache savings are calculated based on the difference between the normal input price and the cache read price. See Bootstrap and Memory for details on context optimization.

Real API Data

The CostTracker supports two data sources:

Source	Precision	When Used
Real API data	High	Provider returns `Usage` in response
Character-based estimate	Approximate	Provider does not return usage data

When the provider returns real data (HasRealData = true), tracking uses exact token counts. This is supported by 13+ providers including Anthropic, OpenAI, ZAI, MiniMax, DeepSeek, and OpenRouter.

Per-Model Cost

In sessions with multiple models (e.g., fallback chain), /cost shows a breakdown by model:

Per-Model Breakdown:
  anthropic:claude-sonnet-4-6    32 reqs   $0.1845  (real data)
  openai:gpt-4o-mini              5 reqs   $0.0023  (real data)

Session Persistence

Cost data is persisted to disk so it can be consulted after the session ends:

~/.chatcli/sessions/<session_id>/cost.json

The file contains the complete snapshot (SessionCostData) with per-model usage, costs, and timestamps.

Session Budget

Configure a spending limit per session to avoid unexpected costs:

Environment Variable	Description	Default
`CHATCLI_SESSION_BUDGET_USD`	Maximum spending limit per session in USD	0 (no limit)
`CHATCLI_BUDGET_WARNING_PCT`	Percentage threshold for budget proximity warning	0.80 (80%)

Budget Levels

Level	Condition	Behavior
`BudgetOK`	Spending below 80% of the limit	Normal
`BudgetWarning`	Spending between 80-100% of the limit	Warning displayed
`BudgetExceeded`	Spending above the limit	Session may be limited

# Example: limit session to $5.00
export CHATCLI_SESSION_BUDGET_USD=5.00

# Warn when reaching 70% of the limit
export CHATCLI_BUDGET_WARNING_PCT=0.70

When the budget is exceeded, ChatCLI displays a warning but does not automatically terminate the session. The user can decide to continue or end it.

Next Steps

Conversation Control

Use /compact to reduce tokens and costs.

One-Shot Mode

Monitor costs in automated pipelines.

Tool Results

Tool result budgets that impact token usage.

Context Recovery

Automatic strategies when context overflows.

​The /cost Command

​Token Tracking by Mode

​Pricing Tables

​Anthropic

​OpenAI

​Google

​ZAI (Zhipu AI)

​MiniMax

​Moonshot (Kimi)

​DeepSeek

​OpenRouter

​Visual Display

​Token Format

​Context Progress Bar

​Cache Savings

​Cache Tokens (Anthropic)

​Real API Data

​Per-Model Cost

​Session Persistence

​Session Budget

​Budget Levels

​Next Steps

Conversation Control

One-Shot Mode

Tool Results

Context Recovery

The /cost Command

Token Tracking by Mode

Pricing Tables

Anthropic

OpenAI

Google

ZAI (Zhipu AI)

MiniMax

Moonshot (Kimi)

DeepSeek

OpenRouter

Visual Display

Token Format

Context Progress Bar

Cache Savings

Cache Tokens (Anthropic)

Real API Data

Per-Model Cost

Session Persistence

Session Budget

Budget Levels

Next Steps