Configure automatic failover between LLM providers with intelligent error classification, exponential cooldown, and health monitoring.
ChatCLI supports an automatic failover chain between LLM providers. When the primary provider fails (rate limit, timeout, server error), the system automatically tries the next provider in the chain, completely transparently.
# Ordered list of providers (first = highest priority)export CHATCLI_FALLBACK_PROVIDERS="OPENAI,CLAUDEAI,GOOGLEAI,ZAI,MINIMAX,MOONSHOT,OPENROUTER,COPILOT"# Specific model per provider (optional)export CHATCLI_FALLBACK_MODEL_OPENAI="gpt-5.4"export CHATCLI_FALLBACK_MODEL_CLAUDEAI="claude-sonnet-4-6"export CHATCLI_FALLBACK_MODEL_GOOGLEAI="gemini-2.5-flash"export CHATCLI_FALLBACK_MODEL_ZAI="glm-5"export CHATCLI_FALLBACK_MODEL_MINIMAX="MiniMax-M2.7"export CHATCLI_FALLBACK_MODEL_OPENROUTER="anthropic/claude-sonnet-4"export CHATCLI_FALLBACK_MODEL_COPILOT="gpt-4o"# MiniMax in Anthropic-compatible mode (optional)export MINIMAX_API_COMPAT="anthropic"# Retry and cooldown controlexport CHATCLI_FALLBACK_MAX_RETRIES="2" # attempts per providerexport CHATCLI_FALLBACK_COOLDOWN_BASE="30s" # base cooldownexport CHATCLI_FALLBACK_COOLDOWN_MAX="5m" # maximum cooldown
After consecutive failures, the provider enters cooldown with exponential backoff:
Consecutive Failures
Cooldown
1
30s
2
60s
3
120s
4
240s
5+
300s (max)
In interactive CLI mode, authentication errors (401) automatically trigger an OAuth token refresh and retry the request. In server mode (fallback chain), authentication errors receive immediate maximum cooldown (5m). A successful request clears all cooldown for the provider. Use ResetCooldowns() to clear manually (e.g., after updating credentials).
The fallback chain also supports SendPromptWithTools for providers that implement the ToolAwareClient interface. Providers without native tool use support are automatically skipped in the tool call chain.
Place the cheapest/fastest provider first in the chain.
Diversify providers
Mix providers from different companies for real resilience.
Configure models per provider
Use models with equivalent capabilities to maintain quality.
Monitor health
Regularly check if any provider is in persistent cooldown.
OpenRouter native fallback: In addition to ChatCLI’s provider fallback chain, OpenRouter itself supports server-side fallback routing via OPENROUTER_FALLBACK_MODELS. When set (e.g., anthropic/claude-sonnet-4,google/gemini-2.5-flash), OpenRouter automatically routes to the next model if the primary is unavailable — before ChatCLI’s own chain advances. These two mechanisms are complementary: OpenRouter handles model-level failover within its gateway, while ChatCLI handles provider-level failover across different APIs.
Each provider in the chain needs its own configured API key. Make sure to configure the keys for all providers listed in CHATCLI_FALLBACK_PROVIDERS.