Track token costs and price estimates per session across all modes, with real API data and configurable budgets
ChatCLI’s Cost Tracking monitors token consumption and estimates costs in real time during your sessions, with real API usage data when available. You can track how much each conversation is costing, view costs by model/provider, and configure spending limits.
When the provider returns real token usage data (Usage.IsReal = true), costs are calculated with precision. For providers that do not return real usage, ChatCLI estimates tokens from text size.
GPT models surface usage in the chat envelope too, with the same N↑ M↓ arrows the Claude flow has used since launch. ChatCLI sends stream_options: {include_usage: true} on streaming Chat Completions and parses response.completed on the Responses API, so input/output (and cache-hit) counts appear on the envelope regardless of provider. Cached prompt tokens reported under prompt_tokens_details.cached_tokens (Chat Completions) / input_tokens_details.cached_tokens (Responses) map to CacheReadInputTokens, the same field Anthropic prompt caching feeds. Reasoning tokens (o-series / GPT-5) are surfaced under a separate ReasoningTokens informational field — they’re already counted in CompletionTokens and billed as output.
The ChatCLI cost tracker bills the cache-miss price (0.95/Minput)tostayconservative.Thecachehit(0.16/M) is an automatic API saving when the same prefix is reused within the provider’s window — not tracked at runtime.
Prices are updated periodically in ChatCLI releases. For unlisted models or custom providers (via OpenAI-compatible), the cost shows as “N/A”. OpenRouter provides prices via API — ChatCLI uses its pricing ConfigMap to estimate costs for the most popular models.
The cache read cost is typically 10% of the normal input cost, resulting in significant savings in long conversations with large system prompts.
Cache savings are calculated based on the difference between the normal input price and the cache read price. See Bootstrap and Memory for details on context optimization.
When the provider returns real data (HasRealData = true), tracking uses exact token counts. This is supported by 13+ providers including Anthropic, OpenAI, ZAI, MiniMax, DeepSeek, and OpenRouter.