Skip to main content
ChatCLI supports an automatic failover chain between LLM providers. When the primary provider fails (rate limit, timeout, server error), the system automatically tries the next provider in the chain, completely transparently.

How It Works

The fallback chain is an ordered list of providers. Each request traverses the list until it succeeds or all options are exhausted:
Request -> OpenAI (primary)
             | failed (rate limit)
           Claude (secondary)
             | failed (timeout)
           Google AI (tertiary)
             | success
           Response returned to user

Configuration

# Ordered list of providers (first = highest priority)
export CHATCLI_FALLBACK_PROVIDERS="OPENAI,CLAUDEAI,GOOGLEAI,COPILOT"

# Specific model per provider (optional)
export CHATCLI_FALLBACK_MODEL_OPENAI="gpt-4o"
export CHATCLI_FALLBACK_MODEL_CLAUDEAI="claude-sonnet-4-20250514"
export CHATCLI_FALLBACK_MODEL_GOOGLEAI="gemini-2.0-flash"
export CHATCLI_FALLBACK_MODEL_COPILOT="gpt-4o"

# Retry and cooldown control
export CHATCLI_FALLBACK_MAX_RETRIES="2"       # attempts per provider
export CHATCLI_FALLBACK_COOLDOWN_BASE="30s"    # base cooldown
export CHATCLI_FALLBACK_COOLDOWN_MAX="5m"      # maximum cooldown

Error Classification

The system automatically classifies each failure to decide the strategy:
ClassBehaviorExamples
rate_limitWaits for backoff, then retriesHTTP 429, “too many requests”
timeoutRetries up to maxRetriesDeadline exceeded, connection timeout
server_errorRetries up to maxRetriesHTTP 500, 502, 503
auth_errorDoes not retry — advances in the chainHTTP 401, 403, “invalid api key”
model_not_foundDoes not retry — advances in the chainHTTP 404, “model not found”
context_too_longDoes not retry — advances in the chain”context length exceeded”

Exponential Cooldown

After consecutive failures, the provider enters cooldown with exponential backoff:
Consecutive FailuresCooldown
130s
260s
3120s
4240s
5+300s (max)
Authentication errors receive immediate maximum cooldown (5m). A successful request clears all cooldown for the provider. Use ResetCooldowns() to clear manually (e.g., after updating credentials).

Health Monitoring

The chain tracks the state of each provider in real time:
health := chain.GetHealth()
for _, h := range health {
    fmt.Printf("Provider: %s, Available: %v, Fails: %d, Cooldown: %v\n",
        h.Name, h.Available, h.ConsecutiveFails, h.CooldownUntil)
}
Fields tracked per provider:
FieldDescription
AvailableWhether the provider is available for requests
ConsecutiveFailsNumber of consecutive failures
LastErrorClassType of the last failure
CooldownUntilWhen the cooldown expires
LastErrorAtTimestamp of the last failure

Tool Use with Fallback

The fallback chain also supports SendPromptWithTools for providers that implement the ToolAwareClient interface. Providers without native tool use support are automatically skipped in the tool call chain.

Best Practices

Order by cost-effectiveness

Place the cheapest/fastest provider first in the chain.

Diversify providers

Mix providers from different companies for real resilience.

Configure models per provider

Use models with equivalent capabilities to maintain quality.

Monitor health

Regularly check if any provider is in persistent cooldown.
Each provider in the chain needs its own configured API key. Make sure to configure the keys for all providers listed in CHATCLI_FALLBACK_PROVIDERS.