Skip to main content
The /ratelimit command (alias /limits) shows the current state of your LLM provider’s rate limits, without making any extra call: ChatCLI reads the x-ratelimit-* headers the provider already returns on every response and keeps a per-provider snapshot.
Capture is passive and central: an observer on the auth path intercepts the HTTP responses of all providers that send the x-ratelimit-* header family (OpenAI, OpenRouter, Anthropic-compatible endpoints, etc.). No environment variable is required.

Usage

> /ratelimit
  Rate Limits (from provider headers)
  ─────────────────────────────────────────
  OPENAI
    requests  4987 / 5000   (0% used, resets in 12s)
    tokens    789012 / 800000   (1% used, resets in 48s)
Before the first request there is no data:
> /ratelimit
  Rate Limits (from provider headers)
  ─────────────────────────────────────────
  No rate-limit data captured yet. Make a request first.

What is shown

For each provider that reported limits, ChatCLI shows up to two buckets:
BucketFields
requestsremaining / limit, % used, resets in N s
tokensremaining / limit, % used, resets in N s
The “resets in” value is adjusted for the time elapsed since capture, so it reflects the real remaining time at the moment you run the command.
Useful in pipelines and long sessions to anticipate throttling: if requests or tokens is near 0% remaining, wait for the reset (or switch provider with /switch) before firing a heavy /agent.

See also