How It Works
The fallback chain is an ordered list of providers. Each request traverses the list until it succeeds or all options are exhausted:Configuration
- Environment Variables
- Server Flags
- Helm Chart
Error Classification
The system automatically classifies each failure to decide the strategy:| Class | Behavior | Examples |
|---|---|---|
rate_limit | Waits for backoff, then retries | HTTP 429, “too many requests” |
timeout | Retries up to maxRetries | Deadline exceeded, connection timeout |
server_error | Retries up to maxRetries | HTTP 500, 502, 503 |
auth_error | Does not retry — advances in the chain | HTTP 401, 403, “invalid api key” |
model_not_found | Does not retry — advances in the chain | HTTP 404, “model not found” |
context_too_long | Does not retry — advances in the chain | ”context length exceeded” |
Exponential Cooldown
After consecutive failures, the provider enters cooldown with exponential backoff:| Consecutive Failures | Cooldown |
|---|---|
| 1 | 30s |
| 2 | 60s |
| 3 | 120s |
| 4 | 240s |
| 5+ | 300s (max) |
Authentication errors receive immediate maximum cooldown (5m). A successful request clears all cooldown for the provider. Use
ResetCooldowns() to clear manually (e.g., after updating credentials).Health Monitoring
The chain tracks the state of each provider in real time:| Field | Description |
|---|---|
Available | Whether the provider is available for requests |
ConsecutiveFails | Number of consecutive failures |
LastErrorClass | Type of the last failure |
CooldownUntil | When the cooldown expires |
LastErrorAt | Timestamp of the last failure |
Tool Use with Fallback
The fallback chain also supportsSendPromptWithTools for providers that implement the ToolAwareClient interface. Providers without native tool use support are automatically skipped in the tool call chain.
Best Practices
Order by cost-effectiveness
Place the cheapest/fastest provider first in the chain.
Diversify providers
Mix providers from different companies for real resilience.
Configure models per provider
Use models with equivalent capabilities to maintain quality.
Monitor health
Regularly check if any provider is in persistent cooldown.