ChatCLI supports a wide range of models from major AI providers. Switch models at any time with /switch --model <name>.
Capabilities legend:
- π Vision β accepts images as input
- π§ Tools β native tool use (function calling)
- π JSON Mode β guaranteed structured JSON output
- π» Code Exec β native code execution on the provider
All providers support streaming via SSE (Server-Sent Events). ChatCLI enables streaming automatically.
Models ideal for code generation and complex reasoning. Support both Chat Completions API and Responses API.| Model (ID) | Aliases | Context | Max Output | Capabilities |
|---|
gpt-5.4 | β | 200K tokens | 200K tokens | π§ Tools |
gpt-5.3-codex | β | 200K tokens | 200K tokens | π§ Tools |
gpt-5.2 | β | 100K tokens | 100K tokens | π§ Tools |
gpt-5 | gpt-5-mini, gpt-5-nano | 50K tokens | 50K tokens | π§ Tools |
gpt-4o | β | 50K tokens | 50K tokens | π Vision, π§ Tools |
gpt-4o-mini | β | 50K tokens | 50K tokens | π Vision, π§ Tools |
gpt-4 | gpt-4.1, gpt-4.1-mini, gpt-4.1-nano | 50K tokens | 50K tokens | π§ Tools |
ChatCLI uses the Chat Completions API by default. The Responses API can be configured via OPENAI_API_VERSION. Streaming is enabled for all models.
Large context windows and excellent ability to follow complex instructions. All models support streaming via SSE.| Model (ID) | Aliases | Context | Max Output | Capabilities |
|---|
claude-fable-5 | fable-5, fable | 1M tokens | 128K tokens | π Vision, π§ Tools, π§ Adaptive thinking, βοΈ Mid-conv system |
claude-opus-4-8 | opus-4-8 | 1M tokens | 128K tokens | π Vision, π§ Tools, π§ Adaptive thinking, β‘ Fast mode, βοΈ Mid-conv system, πΎ 1K-token cache floor |
claude-opus-4-7 | opus-4-7 | 1M tokens | 128K tokens | π Vision, π§ Tools, π§ Adaptive thinking |
claude-sonnet-4-7 | claude-4-7-sonnet, sonnet-4-7 | 200K tokens | 128K tokens | π Vision, π§ Tools |
claude-opus-4-6 | opus-4-6 | 400K tokens | 64K tokens | π Vision, π§ Tools |
claude-sonnet-4-6 | sonnet-4-6 | 200K tokens | 128K tokens | π Vision, π§ Tools |
claude-opus-4-5 | opus-4-5 | 200K tokens | 64K tokens | π Vision, π§ Tools |
claude-sonnet-4-5 | claude-4-5-sonnet, sonnet-4-5 | 200K tokens | 128K tokens | π Vision, π§ Tools |
claude-opus-4-1-20250805 | claude-opus-4-1, opus-4-1 | 20K tokens | 20K tokens | π§ Tools |
claude-opus-4-20250514 | opus-4 | 20K tokens | 20K tokens | π§ Tools |
claude-sonnet-4 | claude-4-sonnet, sonnet-4-20250514 | 50K tokens | 20K tokens | π Vision, π§ Tools |
claude-sonnet-3-7-20250219 | claude-3-7-sonnet | 50K tokens | 20K tokens | π§ Tools |
claude-sonnet-3-5-20241022 | claude-3-5-sonnet | 50K tokens | 20K tokens | π§ Tools |
claude-opus-3 | claude-3-opus | 32K tokens | 20K tokens | π§ Tools |
claude-haiku-3 | claude-3-haiku | 42K tokens | 20K tokens | π§ Tools |
Claude Fable 5 (claude-fable-5) is Anthropicβs most capable model β a tier above Opus (10/50 per MTok). Same API surface as Opus 4.7/4.8 (adaptive thinking only, no temperature/top_p/top_k) with one extra constraint: an explicit thinking:{type:"disabled"} returns 400 β the field must be omitted to run without thinking (ChatCLIβs client already does). Shortcut: /model fable. No Bedrock entry yet β AWS has not published the dated inference profile.claude-opus-4-8 and claude-opus-4-7 ship with 1M native context (no extra flag). claude-opus-4-6 can also use 1M context by setting ANTHROPIC_1MTOKENS_SONNET=true. Different models may use distinct anthropic-version headers, managed automatically by the catalog. Catalog order: the 4.x entries are declared newest-first in the registry. This prevents a silent alias collision where opus-4-5, opus-4-6, opus-4-7 and opus-4-8 (typed as shortcuts) would resolve to claude-opus-4-20250514 (a mere 20K context) because the 4.0 entryβs opus-4 alias is a prefix of all of them. If you add a Claude 4.9 / 5.x in the future, keep this newest-first order.
Claude Opus 4.8 β whatβs new
Released May 28, 2026. Same default 1M / 128K profile as Opus 4.7 but with four new launch capabilities the catalog tracks as feature flags:| Capability | What it means |
|---|
adaptive_thinking | Only thinking mode accepted by 4.7+. ChatCLI emits thinking:{type:"adaptive"} when a skill provides an effort: hint β the model decides per turn whether to reason. Sending budget_tokens returns HTTP 400. |
fast_mode | Research-preview faster output (~2.5Γ tokens/sec) at premium pricing. Opt in with ANTHROPIC_SPEED=fast. |
mid_conversation_system | Server accepts role:"system" after the first user turn, preserving prompt-cache hits across instruction updates. ChatCLIβs message builder already passes structured system blocks through unchanged. |
low_cache_minimum | Minimum cacheable prompt drops from previous modelsβ floor to 1,024 tokens. Prompts that didnβt qualify on 4.7 now create cache entries with no code change. |
Skill effort: medium|high|max continues to work β on Opus 4.7 and 4.8 it maps to adaptive thinking automatically; on older 4.x / 3.7 it falls back to budgeted extended thinking (thinking:{type:"enabled", budget_tokens:N}). Full AWS Bedrock catalog β Anthropic, OpenAI, Llama, Nova, Mistral, Cohere, AI21, DeepSeek, Moonshot Kimi, MiniMax, Qwen, Z.AI/GLM, Gemma, Nemotron, TwelveLabs, and any provider AWS adds. Auth uses the AWS SDKβs default credentials chain (IAM role, ~/.aws/credentials, env vars) β no API key from the original providers is needed.Modern models (Claude 3.7+/4.x/4.5/4.6/4.7 and equivalents from other providers) do not accept direct on-demand invocation by base ID β they require an inference profile ID (prefixes global., us., eu., apac.). ChatCLI automatically filters non-invokable base IDs from /switch --model, so only what works appears. See AWS Bedrock for details. | Model (ID) | Aliases | Context | Max Output | Capabilities |
|---|
global.anthropic.claude-opus-4-8-20260528-v1:0 | bedrock-opus-4-8, claude-opus-4-8 | 1M tokens | 128K tokens | π Vision, π§ Tools, π§ Adaptive thinking, βοΈ Mid-conv system, πΎ 1K-token cache floor |
global.anthropic.claude-opus-4-7-20260401-v1:0 | bedrock-opus-4-7, claude-opus-4-7 | 1M tokens | 128K tokens | π Vision, π§ Tools, π§ Adaptive thinking |
global.anthropic.claude-sonnet-4-7-20260401-v1:0 | bedrock-sonnet-4-7, claude-sonnet-4-7 | 200K tokens | 128K tokens | π Vision, π§ Tools |
global.anthropic.claude-sonnet-4-6-20260115-v1:0 | bedrock-sonnet-4-6, claude-sonnet-4-6 | 400K tokens | 128K tokens | π Vision, π§ Tools |
global.anthropic.claude-opus-4-6-20260115-v1:0 | bedrock-opus-4-6, claude-opus-4-6 | 400K tokens | 64K tokens | π Vision, π§ Tools |
global.anthropic.claude-haiku-4-5-20251001-v1:0 | bedrock-haiku-4-5, claude-haiku-4-5 | 200K tokens | 64K tokens | π Vision, π§ Tools |
global.anthropic.claude-sonnet-4-5-20250929-v1:0 | bedrock-sonnet-4-5, claude-sonnet-4-5 | 200K tokens | 64K tokens | π Vision, π§ Tools |
us.anthropic.claude-sonnet-4-5-20250929-v1:0 | bedrock-sonnet-4-5-us | 200K tokens | 64K tokens | π Vision, π§ Tools |
global.anthropic.claude-opus-4-5-20251001-v1:0 | bedrock-opus-4-5, claude-opus-4-5 | 200K tokens | 64K tokens | π Vision, π§ Tools |
us.anthropic.claude-sonnet-4-20250514-v1:0 | bedrock-sonnet-4, claude-sonnet-4 | 200K tokens | 64K tokens | π Vision, π§ Tools |
eu.anthropic.claude-sonnet-4-20250514-v1:0 | bedrock-sonnet-4-eu | 200K tokens | 64K tokens | π Vision, π§ Tools |
us.anthropic.claude-opus-4-20250514-v1:0 | bedrock-opus-4, claude-opus-4 | 200K tokens | 32K tokens | π Vision, π§ Tools |
us.anthropic.claude-opus-4-1-20250805-v1:0 | bedrock-opus-4-1, claude-opus-4-1 | 200K tokens | 32K tokens | π Vision, π§ Tools |
us.anthropic.claude-3-7-sonnet-20250219-v1:0 | bedrock-sonnet-3-7, claude-3-7-sonnet | 200K tokens | 64K tokens | π Vision, π§ Tools |
eu.anthropic.claude-3-7-sonnet-20250219-v1:0 | bedrock-sonnet-3-7-eu | 200K tokens | 64K tokens | π Vision, π§ Tools |
anthropic.claude-3-5-sonnet-20241022-v2:0 | bedrock-sonnet-3-5-v2 | 200K tokens | 8K tokens | π Vision, π§ Tools |
anthropic.claude-3-5-sonnet-20240620-v1:0 | bedrock-sonnet-3-5-v1 | 200K tokens | 8K tokens | π Vision, π§ Tools |
anthropic.claude-3-5-haiku-20241022-v1:0 | bedrock-haiku-3-5, claude-3-5-haiku | 200K tokens | 8K tokens | π§ Tools |
anthropic.claude-3-opus-20240229-v1:0 | bedrock-opus-3, claude-3-opus | 200K tokens | 4K tokens | π Vision, π§ Tools |
anthropic.claude-3-haiku-20240307-v1:0 | bedrock-haiku-3, claude-3-haiku | 200K tokens | 4K tokens | π Vision, π§ Tools |
OpenAI GPT-OSS (open-weights) β OpenAI models hosted on Bedrock. Use the OpenAI Chat Completions schema (auto-detected by openai.* prefix, or forced via BEDROCK_PROVIDER=openai).| Model (ID) | Aliases | Context | Max Output | Capabilities |
|---|
openai.gpt-oss-120b-1:0 | bedrock-gpt-oss-120b, gpt-oss-120b | 128K tokens | 16K tokens | π§ Tools, π JSON |
openai.gpt-oss-20b-1:0 | bedrock-gpt-oss-20b, gpt-oss-20b | 128K tokens | 16K tokens | π§ Tools, π JSON |
Other providers via Converse API β Llama, Nova, Mistral, Cohere, AI21, DeepSeek, Moonshot Kimi, MiniMax, Qwen, Z.AI/GLM, Gemma, Nemotron, TwelveLabs, etc. are not hardcoded in the catalog β they appear dynamically in /switch --model based on what your AWS account has access to. ChatCLI routes these models through the AWS Converse API (unified schema), so adding a new provider doesnβt require a release.Examples of IDs seen via ListFoundationModels (your actual list depends on account + region):| Provider | Example Model ID |
|---|
| Moonshot AI | moonshotai.kimi-k2.5, moonshotai.kimi-k2-thinking |
| MiniMax | minimax.m-2-5, minimax.m-2 |
| Z.AI | zai.glm-4-7, zai.glm-4-7-flash |
| Qwen | qwen.qwen3-32b, qwen.qwen3-coder-480b |
| Meta Llama | meta.llama3-70b-instruct-v1:0, us.meta.llama3-1-70b-... |
| Amazon Nova | amazon.nova-pro-v1:0, amazon.nova-lite-v1:0 |
| Mistral | mistral.mistral-large-2407-v1:0 |
| DeepSeek | us.deepseek.r1-v1:0 |
| Google | google.gemma-3-27b-pt |
| NVIDIA | nvidia.nemotron-nano-9b-v2 |
| TwelveLabs | twelvelabs.pegasus-v1.2 |
The dynamic listing (/switch --model) merges bedrock:ListFoundationModels (filtered by ByOutputModality: TEXT + InferenceTypesSupported: ON_DEMAND) and bedrock:ListInferenceProfiles with the static catalog above. No allowlist β any Bedrock provider your account can access shows up automatically. Use the command to see what your AWS account can actually invoke in the configured region.
Embeddings via Bedrock β amazon.titan-embed-text-v2:0 (default, 1024-dim, configurable 256/512/1024), amazon.titan-embed-text-v1 (1536-dim), and Cohere cohere.embed-english-v3 / cohere.embed-multilingual-v3 (1024-dim). Enable with CHATCLI_EMBED_PROVIDER=bedrock. See RAG + HyDE. Advanced multimodal capabilities and massive context windows. Support streaming via SSE.| Model (ID) | Aliases | Context | Max Output | Capabilities |
|---|
gemini-3 | gemini-3-pro, gemini-3-pro-preview | 2M tokens | 2M tokens | π Vision, π§ Tools, π JSON, π» Code Exec |
gemini-2.5-pro | gemini-2.5-pro-latest | 2M tokens | 2M tokens | π Vision, π§ Tools, π JSON, π» Code Exec |
gemini-2.5-flash | β | 1M tokens | 1M tokens | π Vision, π§ Tools, π JSON |
gemini-2.5-flash-lite | β | 1M tokens | 1M tokens | β |
gemini-2.0-flash | β | 1M tokens | 1M tokens | π Vision, π§ Tools, π JSON |
gemini-2.0-flash-lite | β | 1M tokens | 1M tokens | β |
Gemini 3 also supports Multimodal Live for real-time interactions. Models with JSON Mode can return structured output via response_mime_type.
Real-time information integration and large context windows. Support streaming.| Model (ID) | Aliases | Context | Max Output | Capabilities |
|---|
grok-4-1 | grok-4-1-fast | 2M tokens | β | β |
grok-4-fast | grok-4-fast-reasoning-latest, grok-4-0709 | 2M tokens | β | β |
grok-3 | β | 128K tokens | β | β |
grok-3-mini | β | 128K tokens | β | β |
grok-code-fast-1 | β | 200K tokens | β | β |
Grok models use the OpenAI-compatible API. Output limits are managed by the provider.
Use models from the Copilot platform with your subscription (Individual, Business, Enterprise). Authenticate via /auth login github-copilot.The table below shows models registered in the static catalog. With dynamic listing, ChatCLI queries the Copilot API and automatically discovers all models available for your account.| Model (ID) | Context |
|---|
gpt-4o | 128K tokens |
gpt-4o-mini | 128K tokens |
claude-sonnet-4 | 128K tokens |
gemini-2.0-flash | 128K tokens |
| + dynamic models | via API |
Available models vary depending on your plan and region. Use /switch --model to see the full list fetched directly from the Copilot API.
Chinese AI models from Zhipu AI (z.ai) with strong multilingual and coding capabilities. OpenAI-compatible API with native tool calling support.| Model (ID) | Context | Max Output | Capabilities |
|---|
glm-5 | 128K tokens | 128K tokens | π Vision, π§ Tools |
glm-4.7 | 202K tokens | 65K tokens | π§ Tools |
glm-4.5 | 128K tokens | 98K tokens | π§ Tools |
glm-4.5-flash | 128K tokens | 16K tokens | π§ Tools |
glm-5v-turbo | 128K tokens | β | π Vision, π§ Tools |
glm-4.5v | 128K tokens | β | π Vision |
codegeex-4 | 128K tokens | β | π§ Tools |
ZAI uses an OpenAI-compatible API at https://api.z.ai/api/paas/v4/chat/completions. Authentication is via ZAI_API_KEY Bearer token. Model IDs are case-sensitive.
Automatic JWT authentication: Keys in id.secret format automatically enable JWT token rotation (HMAC-SHA256), cached for 30 minutes. Keys without β.β work as traditional Bearer tokens. No additional configuration needed.
High-performance models from MiniMax with large context windows and native tool calling. OpenAI-compatible API.| Model (ID) | Context | Max Output | Capabilities |
|---|
MiniMax-M2.7 | 204K tokens | 131K tokens | π Vision, π§ Tools |
MiniMax-M2.7-highspeed | 204K tokens | β | π Vision, π§ Tools |
MiniMax-M2.5 | 196K tokens | 65K tokens | π Vision, π§ Tools |
MiniMax-M2.5-highspeed | 196K tokens | β | π Vision, π§ Tools |
MiniMax-Text-01 | 128K tokens | 2K tokens | π JSON Mode |
MiniMax model IDs are case-sensitive (e.g., MiniMax-M2.7, not minimax-m2.7). The API uses a base_resp field for error handling with status_code and status_msg.
Anthropic-compatible endpoint: Set MINIMAX_API_COMPAT=anthropic to use https://api.minimax.io/anthropic/v1/messages with Anthropic Messages format. Native tool calling is disabled in this mode (falls back to XML). Model listing always uses the native endpoint.
Moonshot AIβs Kimi family β 1T-parameter MoE with 32B activated in the K2.6 flagship, 256K context window, native MoonViT vision encoder and explicit βthinkingβ mode. OpenAI-compatible API at https://api.moonshot.ai/v1/chat/completions.| Model (ID) | Aliases | Context | Max Output | Capabilities |
|---|
kimi-k2.6 | kimi-k2-6, k2.6, k2-6 | 256K tokens | 131K tokens | π§ Tools, π Vision, π§ Thinking, π JSON Mode |
kimi-k2.5 | kimi-k2-5, k2.5, k2-5 | 256K tokens | 96K tokens | π§ Tools, π Vision, π§ Thinking, π JSON Mode |
kimi-latest | β | 256K tokens | 131K tokens | π§ Tools, π Vision, π§ Thinking, π JSON Mode |
kimi-k2-turbo-preview | kimi-k2-turbo | 256K tokens | 65K tokens | π§ Tools, π JSON Mode |
kimi-thinking-preview | β | 128K tokens | 65K tokens | π§ Tools, π§ Thinking, π JSON Mode |
moonshot-v1-128k | β | 128K tokens | 32K tokens | π§ Tools, π JSON Mode |
moonshot-v1-32k | β | 32K tokens | 16K tokens | π§ Tools, π JSON Mode |
moonshot-v1-8k | β | 8K tokens | 4K tokens | π§ Tools, π JSON Mode |
Thinking mode: Set MOONSHOT_THINKING=enabled|disabled|auto to toggle between Thinking (explicit reasoning, default for K2.6/K2.5) and Instant (direct response, cheaper). The default auto lets the model choose; models without the thinking capability ignore the flag.
Public pricing (May 2026): kimi-k2.6 is 0.95/Minputtokens(cachemiss)and4.00/M output. Cache-hit input drops to $0.16/M, but the ChatCLI cost tracker bills the cache-miss price for a conservative accounting. Accepts all compatible models on the StackSpotAI platform, selected during Agent creation.
Multi-provider API gateway β access 200+ models from OpenAI, Anthropic, Google, Meta, Mistral, DeepSeek, and more through a single API key. Uses an OpenAI-compatible API at https://openrouter.ai/api/v1/chat/completions.Models use the provider/model-name format:| Model (ID) | Provider | Capabilities |
|---|
openai/gpt-4o | OpenAI | π Vision, π§ Tools |
openai/gpt-4o-mini | OpenAI | π Vision, π§ Tools |
anthropic/claude-sonnet-4 | Anthropic | π Vision, π§ Tools |
anthropic/claude-opus-4 | Anthropic | π Vision, π§ Tools |
google/gemini-2.5-pro | Google | π Vision, π§ Tools, π JSON |
google/gemini-2.5-flash | Google | π Vision, π§ Tools, π JSON |
meta-llama/llama-4-maverick | Meta | π§ Tools |
deepseek/deepseek-r1 | DeepSeek | π§ Tools |
mistralai/mistral-large | Mistral | π§ Tools |
The table above shows popular defaults. OpenRouter provides 200+ models β ChatCLI discovers them dynamically via the /api/v1/models endpoint. Use /switch --model to browse the full list.
OpenRouter supports native fallback routing via OPENROUTER_FALLBACK_MODELS. If your primary model is unavailable, OpenRouter automatically routes to the next model in the list β handled server-side before ChatCLIβs own fallback chain kicks in.
Supports any local model via Ollama. Configure in .env:OLLAMA_ENABLED=true
OLLAMA_MODEL="llama3"
Or switch interactively: /switch --model llama3Use ollama pull <model> to download new models.
How model selection works
ChatCLI determines which model to use with the following priority (highest to lowest):
--model flag on the command line: chatcli --model gpt-5.4
/switch command during a session: /switch --model claude-sonnet-4-6
MODEL environment variable: sets the default model
LLM_PROVIDER environment variable: determines the provider (openai, anthropic, google, xai, etc.)
- Providerβs default model: each provider has a default model defined in the catalog
# Example: set provider and model via .env
LLM_PROVIDER=anthropic
MODEL=claude-sonnet-4-6
Model aliases
Each model has aliases for easier typing. ChatCLI automatically resolves aliases to the canonical model ID. For example:
| Alias typed | Resolved model |
|---|
claude-4-5-sonnet | claude-sonnet-4-5 |
sonnet-4-5 | claude-sonnet-4-5 |
opus-4-6 | claude-opus-4-6 |
opus-4-7 | claude-opus-4-7 |
opus-4-8 | claude-opus-4-8 |
sonnet-4-7 | claude-sonnet-4-7 |
gpt-5-mini | gpt-5 (mini variant) |
gemini-3-pro | gemini-3 |
Aliases are defined in the model catalog and accepted in all contexts: --model, /switch, and the MODEL variable.
Catalog system
Models are registered in the llm/catalog package with complete metadata. ChatCLI uses the catalog to automatically determine:
- API version β which endpoint and protocol version to use for each model
- Max tokens β context and output limits for managing prompts and responses
- Capabilities β which features are available (vision, tools, JSON mode, etc.)
- Provider-specific headers β for example, the
anthropic-version header varies per model
This means that when switching models, ChatCLI automatically adjusts all request parameters without manual configuration.
Dynamic model listing
ChatCLI fetches available models directly from each providerβs API, using the configured token or API key. This ensures you see exactly which models your account has access to β including new models not yet in the static catalog.
How it works
- When ChatCLI starts or when you switch providers (via
/switch, /auth login, etc.), a background request queries the active providerβs models endpoint
- Discovered models are cached for use in the
/switch --model autocomplete
- Each suggestion indicates its origin:
[API] (dynamic) or [catalog] (static)
Endpoints per provider
| Provider | Endpoint | Auth |
|---|
| OpenAI | GET /v1/models | API Key or OAuth |
| Anthropic | GET /v1/models | API Key or OAuth |
| Google AI | GET /v1beta/models | API Key |
| xAI | GET /v1/models | API Key |
| GitHub Copilot | GET /models | OAuth (Device Flow) |
| Ollama | GET /api/tags | No auth (local) |
| ZAI (Zhipu AI) | GET /models | API Key |
| MiniMax | GET /models | API Key |
| Moonshot (Kimi) | GET /v1/models | API Key (Bearer) |
| OpenRouter | GET /api/v1/models | API Key |
| StackSpot | β | Not supported (model fixed per agent) |
Smart autocomplete
When typing /switch --model and pressing Tab, ChatCLI suggests available models:
> /switch --model [Tab]
gpt-4o GPT-4o (Copilot) [API]
claude-sonnet-4 Claude Sonnet 4 (Copilot) [API]
o4-mini o4-mini (Copilot) [API]
If the API is not reachable, it falls back to the static catalog:
> /switch --model [Tab]
gpt-4o GPT-4o (Copilot) [catalog]
gpt-4o-mini GPT-4o mini (Copilot) [catalog]
Pressing Enter with /switch --model (no value) lists all available models with source indication (API or catalog).
OAuth and dynamic listing
Dynamic listing works with both API key and OAuth:
- Anthropic OAuth: uses
?beta=true and Chrome-like headers, with automatic gzip decompression
- OpenAI OAuth: queries the ChatGPT backend (
/backend-api/models) instead of the standard endpoint
- GitHub Copilot OAuth: uses the Device Flow token to query
api.githubcopilot.com/models
After an /auth login, the model cache is automatically refreshed to reflect the new provider.
Anthropic API versioning
Claude models may use different anthropic-version header values in API requests. The catalog manages this automatically:
- Newer models (claude-opus-4-8, claude-opus-4-7, claude-sonnet-4-6) use the latest API version
- Legacy models (claude-3-opus, claude-3-haiku) may use older versions for compatibility
- ChatCLI sends the correct header for each model without any user intervention