Supported AI Models

ChatCLI supports a wide range of models from major AI providers. Switch models at any time with /switch --model <name>. Capabilities legend:

👁 Vision — accepts images as input
🔧 Tools — native tool use (function calling)
📋 JSON Mode — guaranteed structured JSON output
💻 Code Exec — native code execution on the provider

All providers support streaming via SSE (Server-Sent Events). ChatCLI enables streaming automatically.

Models ideal for code generation and complex reasoning. Support both Chat Completions API and Responses API.

Model (ID)	Aliases	Context	Max Output	Capabilities
`gpt-5.6-sol`	`gpt-5.6`	1.05M tokens	128K tokens	👁 Vision, 🔧 Tools, 📋 JSON Mode
`gpt-5.6-terra`	—	1.05M tokens	128K tokens	👁 Vision, 🔧 Tools, 📋 JSON Mode
`gpt-5.6-luna`	—	1.05M tokens	128K tokens	👁 Vision, 🔧 Tools, 📋 JSON Mode
`gpt-5.5`	—	1.05M tokens	128K tokens	👁 Vision, 🔧 Tools, 📋 JSON Mode
`gpt-5.5-pro`	—	1.05M tokens	128K tokens	👁 Vision, 🔧 Tools, 📋 JSON Mode
`gpt-5.4`	`gpt-5.4-mini`, `gpt-5.4-nano`	200K tokens	100K tokens	👁 Vision, 🔧 Tools, 📋 JSON Mode
`gpt-5.3-codex`	`gpt-5.3`, `gpt-5.3-mini`, `gpt-5.3-nano`	200K tokens	100K tokens	🔧 Tools, 📋 JSON Mode
`gpt-5.2`	`gpt-5.2-mini`, `gpt-5.2-nano`	200K tokens	100K tokens	👁 Vision, 🔧 Tools, 📋 JSON Mode
`gpt-5`	`gpt-5.1`, `gpt-5-mini`, `gpt-5-nano`, `gpt-5-pro`	400K tokens	128K tokens	👁 Vision, 🔧 Tools, 📋 JSON Mode
`o3` / `o3-mini` / `o4-mini`	—	200K tokens	100K tokens	🔧 Tools, 📋 JSON Mode, 🧠 Reasoning
`gpt-4.1`	`gpt-4.1-mini`, `gpt-4.1-nano`	1.05M tokens	32K tokens	👁 Vision, 🔧 Tools, 📋 JSON Mode
`gpt-4o`	`gpt-4o-mini`	128K tokens	16K tokens	👁 Vision, 🔧 Tools, 📋 JSON Mode

GPT-5.6 (GA Jul 9, 2026) ships in three named tiers: Sol (flagship), Terra (balanced everyday) and Luna (fast and affordable) —

5/

30,

2.50/

15 and

1/

6 per MTok respectively. All three work with an API key and with ChatGPT OAuth (/auth login openai-codex); on the Codex backend ChatCLI sends the required client-identification headers automatically (without them the backend returns 404 for Luna).

Routing between Chat Completions and the Responses API is automatic per model via the catalog (gpt-5.x, gpt-4.1 and o-series prefer Responses; gpt-4o stays on Chat Completions). Force Responses for every model with OPENAI_USE_RESPONSES=true. OAuth sessions always use the Responses API. Streaming is enabled for all models.

Large context windows and excellent ability to follow complex instructions. All models support streaming via SSE.

Model (ID)	Aliases	Context	Max Output	Capabilities
`claude-fable-5`	`fable-5`, `fable`	1M tokens	128K tokens	👁 Vision, 🔧 Tools, 🧠 Adaptive thinking, ✉️ Mid-conv system
`claude-opus-5`	`opus-5`, `claude-5-opus`	1M tokens	128K tokens	👁 Vision, 🔧 Tools, 🧠 Adaptive thinking
`claude-sonnet-5`	`sonnet-5`, `claude-5-sonnet`	1M tokens	128K tokens	👁 Vision, 🔧 Tools, 🧠 Adaptive thinking
`claude-opus-4-8`	`opus-4-8`	1M tokens	128K tokens	👁 Vision, 🔧 Tools, 🧠 Adaptive thinking, ⚡ Fast mode, ✉️ Mid-conv system, 💾 1K-token cache floor
`claude-opus-4-7`	`opus-4-7`	1M tokens	128K tokens	👁 Vision, 🔧 Tools, 🧠 Adaptive thinking
`claude-sonnet-4-7`	`claude-4-7-sonnet`, `sonnet-4-7`	1M tokens	64K tokens	👁 Vision, 🔧 Tools
`claude-opus-4-6`	`opus-4-6`	1M tokens	128K tokens	👁 Vision, 🔧 Tools
`claude-sonnet-4-6`	`sonnet-4-6`	1M tokens	64K tokens	👁 Vision, 🔧 Tools
`claude-haiku-4-5-20251001`	`claude-haiku-4-5`, `haiku-4-5`	200K tokens	64K tokens	👁 Vision, 🔧 Tools
`claude-opus-4-5`	`opus-4-5`	200K tokens	64K tokens	👁 Vision, 🔧 Tools
`claude-sonnet-4-5`	`claude-4-5-sonnet`, `sonnet-4-5`	200K tokens	64K tokens	👁 Vision, 🔧 Tools
`claude-opus-4-1-20250805`	`claude-opus-4-1`, `opus-4-1`	200K tokens	32K tokens	🔧 Tools
`claude-opus-4-20250514`	`opus-4`	200K tokens	32K tokens	🔧 Tools
`claude-sonnet-4`	`claude-4-sonnet`, `sonnet-4-20250514`	200K tokens	64K tokens	👁 Vision, 🔧 Tools
`claude-sonnet-3-7-20250219`	`claude-3-7-sonnet`	200K tokens	8K tokens	🔧 Tools
`claude-sonnet-3-5-20241022`	`claude-3-5-sonnet`	200K tokens	8K tokens	🔧 Tools
`claude-opus-3`	`claude-3-opus`	200K tokens	4K tokens	🔧 Tools
`claude-haiku-3`	`claude-3-haiku`	200K tokens	4K tokens	🔧 Tools

Claude Fable 5 (claude-fable-5) is Anthropic’s most capable model — a tier above Opus (

10/

50 per MTok). Same API surface as Opus 4.7/4.8 (adaptive thinking only, no temperature/top_p/top_k) with one extra constraint: an explicit thinking:{type:"disabled"} returns 400 — the field must be omitted to run without thinking (ChatCLI’s client already does). Shortcut: /model fable. Also available on Bedrock as anthropic.claude-fable-5 (dateless ID — the new generation has no ARN-versioned IDs).Claude Sonnet 5 (claude-sonnet-5) is the Sonnet-tier successor (Anthropic skipped 4.7/4.8 for Sonnet): 1M context, 128K output, adaptive thinking,

3/

15 per MTok (introductory pricing

2/

10 through Aug 31, 2026). On Bedrock it is served exclusively by the Messages endpoint (anthropic.claude-sonnet-5) — ChatCLI routes it automatically; see the AWS Bedrock tab.Claude Opus 5 (claude-opus-5, Jul 2026) succeeds Opus 4.8 for complex agentic coding and enterprise work: 1M context, 128K output, adaptive thinking (effort defaults to high server-side),

5/

25 per MTok — the same price as Opus 4.5-4.8. Shortcut: /model opus-5. On Bedrock it is served through the Messages endpoint (anthropic.claude-opus-5), routed automatically like Sonnet 5; on OpenRouter the slug is anthropic/claude-opus-5.claude-opus-4-8 and claude-opus-4-7 ship with 1M native context (no extra flag). claude-opus-4-6 can also use 1M context by setting ANTHROPIC_1MTOKENS_SONNET=true. Different models may use distinct anthropic-version headers, managed automatically by the catalog.

Catalog order: the 4.x entries are declared newest-first in the registry. This prevents a silent alias collision where opus-4-5, opus-4-6, opus-4-7 and opus-4-8 (typed as shortcuts) would resolve to claude-opus-4-20250514 (a mere 20K context) because the 4.0 entry’s opus-4 alias is a prefix of all of them. If you add a Claude 4.9 / 5.x in the future, keep this newest-first order.

Claude Opus 4.8 — what’s new

Released May 28, 2026. Same default 1M / 128K profile as Opus 4.7 but with four new launch capabilities the catalog tracks as feature flags:

Capability	What it means
`adaptive_thinking`	Only thinking mode accepted by 4.7+. ChatCLI emits `thinking:{type:"adaptive"}` when a skill provides an `effort:` hint — the model decides per turn whether to reason. Sending `budget_tokens` returns HTTP 400.
`fast_mode`	Research-preview faster output (~2.5× tokens/sec) at premium pricing. Opt in with `ANTHROPIC_SPEED=fast`.
`mid_conversation_system`	Server accepts `role:"system"` after the first user turn, preserving prompt-cache hits across instruction updates. ChatCLI’s message builder already passes structured system blocks through unchanged.
`low_cache_minimum`	Minimum cacheable prompt drops from previous models’ floor to 1,024 tokens. Prompts that didn’t qualify on 4.7 now create cache entries with no code change.

Skill effort: medium|high|max continues to work — on Opus 4.7 and 4.8 it maps to adaptive thinking automatically; on older 4.x / 3.7 it falls back to budgeted extended thinking (thinking:{type:"enabled", budget_tokens:N}).

Full AWS Bedrock catalog — Anthropic, OpenAI, Llama, Nova, Mistral, Cohere, AI21, DeepSeek, Moonshot Kimi, MiniMax, Qwen, Z.AI/GLM, Gemma, Nemotron, TwelveLabs, and any provider AWS adds. Auth uses the AWS SDK’s default credentials chain (IAM role, ~/.aws/credentials, env vars) — no API key from the original providers is needed.

Modern models (Claude 3.7+/4.x/4.5/4.6/4.7 and equivalents from other providers) do not accept direct on-demand invocation by base ID — they require an inference profile ID (prefixes global., us., eu., apac.). ChatCLI automatically filters non-invokable base IDs from /switch --model, so only what works appears. See AWS Bedrock for details.

New generation = dateless IDs. Fable 5, Opus 5, Sonnet 5, Opus 4.8 and Opus 4.7 have no ARN-versioned IDs on Bedrock — the IDs are anthropic.claude-fable-5, anthropic.claude-opus-5, anthropic.claude-sonnet-5, anthropic.claude-opus-4-8 and anthropic.claude-opus-4-7. Opus 4.8/4.7 are invoked through the global. inference profile (the bare ID is not on-demand invokable on InvokeModel). The old dated IDs (e.g. global.anthropic.claude-opus-4-8-20260528-v1:0) keep resolving as aliases. Opus 5 and Sonnet 5 are exclusive to the Messages endpoint (bedrock-mantle.{region}.api.aws/anthropic/v1/messages) — ChatCLI detects and routes it automatically (SigV4 bedrock-mantle or AWS_BEARER_TOKEN_BEDROCK); see AWS Bedrock.

Model (ID)	Aliases	Context	Max Output	Capabilities
`anthropic.claude-fable-5`	`bedrock-fable-5`, `claude-fable-5`, `fable-5`	1M tokens	128K tokens	👁 Vision, 🔧 Tools, 🧠 Adaptive thinking
`anthropic.claude-opus-5`	`bedrock-opus-5`, `claude-opus-5`, `opus-5`	1M tokens	128K tokens	👁 Vision, 🔧 Tools, 🧠 Adaptive thinking (Messages endpoint)
`anthropic.claude-sonnet-5`	`bedrock-sonnet-5`, `claude-sonnet-5`, `sonnet-5`	1M tokens	128K tokens	👁 Vision, 🔧 Tools, 🧠 Adaptive thinking (Messages endpoint)
`global.anthropic.claude-opus-4-8`	`bedrock-opus-4-8`, `claude-opus-4-8`	1M tokens	128K tokens	👁 Vision, 🔧 Tools, 🧠 Adaptive thinking, 💾 1K-token cache floor
`global.anthropic.claude-opus-4-7`	`bedrock-opus-4-7`, `claude-opus-4-7`	1M tokens	128K tokens	👁 Vision, 🔧 Tools, 🧠 Adaptive thinking
`global.anthropic.claude-sonnet-4-7-20260401-v1:0`	`bedrock-sonnet-4-7`, `claude-sonnet-4-7`	1M tokens	64K tokens	👁 Vision, 🔧 Tools
`global.anthropic.claude-sonnet-4-6`	`bedrock-sonnet-4-6`, `claude-sonnet-4-6`	1M tokens	64K tokens	👁 Vision, 🔧 Tools
`global.anthropic.claude-opus-4-6-v1`	`bedrock-opus-4-6`, `claude-opus-4-6`	1M tokens	128K tokens	👁 Vision, 🔧 Tools
`global.anthropic.claude-haiku-4-5-20251001-v1:0`	`bedrock-haiku-4-5`, `claude-haiku-4-5`	200K tokens	64K tokens	👁 Vision, 🔧 Tools
`global.anthropic.claude-sonnet-4-5-20250929-v1:0`	`bedrock-sonnet-4-5`, `claude-sonnet-4-5`	200K tokens	64K tokens	👁 Vision, 🔧 Tools
`us.anthropic.claude-sonnet-4-5-20250929-v1:0`	`bedrock-sonnet-4-5-us`	200K tokens	64K tokens	👁 Vision, 🔧 Tools
`global.anthropic.claude-opus-4-5-20251001-v1:0`	`bedrock-opus-4-5`, `claude-opus-4-5`	200K tokens	64K tokens	👁 Vision, 🔧 Tools
`global.anthropic.claude-sonnet-4-20250514-v1:0`	`bedrock-sonnet-4`, `claude-sonnet-4`
`us.anthropic.claude-sonnet-4-20250514-v1:0`	`bedrock-sonnet-4-us`	200K tokens	64K tokens	👁 Vision, 🔧 Tools
`eu.anthropic.claude-sonnet-4-20250514-v1:0`	`bedrock-sonnet-4-eu`	200K tokens	64K tokens	👁 Vision, 🔧 Tools
`us.anthropic.claude-opus-4-20250514-v1:0`	`bedrock-opus-4`, `claude-opus-4`	200K tokens	32K tokens	👁 Vision, 🔧 Tools
`us.anthropic.claude-opus-4-1-20250805-v1:0`	`bedrock-opus-4-1`, `claude-opus-4-1`	200K tokens	32K tokens	👁 Vision, 🔧 Tools
`us.anthropic.claude-3-7-sonnet-20250219-v1:0`	`bedrock-sonnet-3-7`, `claude-3-7-sonnet`	200K tokens	64K tokens	👁 Vision, 🔧 Tools
`eu.anthropic.claude-3-7-sonnet-20250219-v1:0`	`bedrock-sonnet-3-7-eu`	200K tokens	64K tokens	👁 Vision, 🔧 Tools
`anthropic.claude-3-5-sonnet-20241022-v2:0`	`bedrock-sonnet-3-5-v2`	200K tokens	8K tokens	👁 Vision, 🔧 Tools
`anthropic.claude-3-5-sonnet-20240620-v1:0`	`bedrock-sonnet-3-5-v1`	200K tokens	8K tokens	👁 Vision, 🔧 Tools
`anthropic.claude-3-5-haiku-20241022-v1:0`	`bedrock-haiku-3-5`, `claude-3-5-haiku`	200K tokens	8K tokens	🔧 Tools
`anthropic.claude-3-opus-20240229-v1:0`	`bedrock-opus-3`, `claude-3-opus`	200K tokens	4K tokens	👁 Vision, 🔧 Tools
`anthropic.claude-3-haiku-20240307-v1:0`	`bedrock-haiku-3`, `claude-3-haiku`	200K tokens	4K tokens	👁 Vision, 🔧 Tools

OpenAI GPT-OSS (open-weights) — OpenAI models hosted on Bedrock. Use the OpenAI Chat Completions schema (auto-detected by openai.* prefix, or forced via BEDROCK_PROVIDER=openai).

Model (ID)	Aliases	Context	Max Output	Capabilities
`openai.gpt-oss-120b-1:0`	`bedrock-gpt-oss-120b`, `gpt-oss-120b`	128K tokens	16K tokens	🔧 Tools, 📋 JSON
`openai.gpt-oss-20b-1:0`	`bedrock-gpt-oss-20b`, `gpt-oss-20b`	128K tokens	16K tokens	🔧 Tools, 📋 JSON

Other providers via Converse API — Llama, Nova, Mistral, Cohere, AI21, DeepSeek, Moonshot Kimi, MiniMax, Qwen, Z.AI/GLM, Gemma, Nemotron, TwelveLabs, etc. are not hardcoded in the catalog — they appear dynamically in /switch --model based on what your AWS account has access to. ChatCLI routes these models through the AWS Converse API (unified schema), so adding a new provider doesn’t require a release.Examples of IDs seen via ListFoundationModels (your actual list depends on account + region):

Provider	Example Model ID
Moonshot AI	`moonshotai.kimi-k2.5`, `moonshotai.kimi-k2-thinking`
MiniMax	`minimax.m-2-5`, `minimax.m-2`
Z.AI	`zai.glm-4-7`, `zai.glm-4-7-flash`
Qwen	`qwen.qwen3-32b`, `qwen.qwen3-coder-480b`
Meta Llama	`meta.llama3-70b-instruct-v1:0`, `us.meta.llama3-1-70b-...`
Amazon Nova	`amazon.nova-pro-v1:0`, `amazon.nova-lite-v1:0`
Mistral	`mistral.mistral-large-2407-v1:0`
DeepSeek	`us.deepseek.r1-v1:0`
Google	`google.gemma-3-27b-pt`
NVIDIA	`nvidia.nemotron-nano-9b-v2`
TwelveLabs	`twelvelabs.pegasus-v1.2`

The dynamic listing (/switch --model) merges bedrock:ListFoundationModels (filtered by ByOutputModality: TEXT + InferenceTypesSupported: ON_DEMAND) and bedrock:ListInferenceProfiles with the static catalog above. No allowlist — any Bedrock provider your account can access shows up automatically. Use the command to see what your AWS account can actually invoke in the configured region.

Embeddings via Bedrock — amazon.titan-embed-text-v2:0 (default, 1024-dim, configurable 256/512/1024), amazon.titan-embed-text-v1 (1536-dim), and Cohere cohere.embed-english-v3 / cohere.embed-multilingual-v3 (1024-dim). Enable with CHATCLI_EMBED_PROVIDER=bedrock. See RAG + HyDE.

Advanced multimodal capabilities and massive context windows. Support streaming via SSE.

Model (ID)	Aliases	Context	Max Output	Capabilities
`gemini-3`	`gemini-3-pro`, `gemini-3-pro-preview`	2M tokens	2M tokens	👁 Vision, 🔧 Tools, 📋 JSON, 💻 Code Exec
`gemini-2.5-pro`	`gemini-2.5-pro-latest`	2M tokens	2M tokens	👁 Vision, 🔧 Tools, 📋 JSON, 💻 Code Exec
`gemini-2.5-flash`	—	1M tokens	1M tokens	👁 Vision, 🔧 Tools, 📋 JSON
`gemini-2.5-flash-lite`	—	1M tokens	1M tokens	—
`gemini-2.0-flash`	—	1M tokens	1M tokens	👁 Vision, 🔧 Tools, 📋 JSON
`gemini-2.0-flash-lite`	—	1M tokens	1M tokens	—

Gemini 3 also supports Multimodal Live for real-time interactions. Models with JSON Mode can return structured output via response_mime_type.

Real-time information integration and large context windows. Support streaming.

Model (ID)	Aliases	Context	Max Output	Capabilities
`grok-4-1`	`grok-4-1-fast`	2M tokens	—	—
`grok-4-fast`	`grok-4-fast-reasoning-latest`, `grok-4-0709`	2M tokens	—	—
`grok-3`	—	128K tokens	—	—
`grok-3-mini`	—	128K tokens	—	—
`grok-code-fast-1`	—	200K tokens	—	—

Grok models use the OpenAI-compatible API. Output limits are managed by the provider.

Use models from the Copilot platform with your subscription (Individual, Business, Enterprise). Authenticate via /auth login github-copilot.The table below shows models registered in the static catalog. With dynamic listing, ChatCLI queries the Copilot API and automatically discovers all models available for your account.

Model (ID)	Context
`gpt-4o`	128K tokens
`gpt-4o-mini`	128K tokens
`claude-sonnet-4`	128K tokens
`gemini-2.0-flash`	128K tokens
+ dynamic models	via API

Available models vary depending on your plan and region. Use /switch --model to see the full list fetched directly from the Copilot API.

Chinese AI models from Zhipu AI (z.ai) with strong multilingual and coding capabilities. OpenAI-compatible API with native tool calling support.

Model (ID)	Context	Max Output	Capabilities
`glm-5.2`	1M tokens	128K tokens	🔧 Tools, 📋 JSON
`glm-5.1`	200K tokens	128K tokens	👁 Vision, 🔧 Tools
`glm-5-turbo`	200K tokens	128K tokens	👁 Vision, 🔧 Tools
`glm-5`	200K tokens	128K tokens	👁 Vision, 🔧 Tools
`glm-4.7`	200K tokens	128K tokens	🔧 Tools
`glm-4.6`	200K tokens	128K tokens	🔧 Tools
`glm-4.5`	128K tokens	96K tokens	🔧 Tools
`glm-4.5-flash`	128K tokens	16K tokens	🔧 Tools
`glm-5v-turbo`	128K tokens	16K tokens	👁 Vision, 🔧 Tools
`glm-4.5v`	128K tokens	16K tokens	👁 Vision
`codegeex-4`	128K tokens	16K tokens	🔧 Tools

GLM-5.2 (released Jun 13, 2026) is Zhipu’s open-weight flagship: 1M-token context, 128K output, MIT license, tuned for coding and agentic workloads (thinking mode, function calling, structured output). List price:

1.40/

4.40 per MTok (GLM-5:

1.00/

3.20) — ChatCLI’s cost tracker uses these rates. The provider default model remains glm-5; use /switch --model glm-5.2 to switch.

ZAI uses an OpenAI-compatible API at https://api.z.ai/api/paas/v4/chat/completions. Authentication is via ZAI_API_KEY Bearer token. Model IDs are case-sensitive.

Automatic JWT authentication: Keys in id.secret format automatically enable JWT token rotation (HMAC-SHA256), cached for 30 minutes. Keys without ”.” work as traditional Bearer tokens. No additional configuration needed.

High-performance models from MiniMax with large context windows and native tool calling. OpenAI-compatible API.

Model (ID)	Context	Max Output	Capabilities
`MiniMax-M2.7`	204K tokens	131K tokens	👁 Vision, 🔧 Tools
`MiniMax-M2.7-highspeed`	204K tokens	—	👁 Vision, 🔧 Tools
`MiniMax-M2.5`	196K tokens	65K tokens	👁 Vision, 🔧 Tools
`MiniMax-M2.5-highspeed`	196K tokens	—	👁 Vision, 🔧 Tools
`MiniMax-Text-01`	128K tokens	2K tokens	📋 JSON Mode

MiniMax model IDs are case-sensitive (e.g., MiniMax-M2.7, not minimax-m2.7). The API uses a base_resp field for error handling with status_code and status_msg.

Anthropic-compatible endpoint: Set MINIMAX_API_COMPAT=anthropic to use https://api.minimax.io/anthropic/v1/messages with Anthropic Messages format. Native tool calling is disabled in this mode (falls back to XML). Model listing always uses the native endpoint.

Moonshot AI’s Kimi family — 1T-parameter MoE with 32B activated in the K2.6 flagship, 256K context window, native MoonViT vision encoder and explicit “thinking” mode. OpenAI-compatible API at https://api.moonshot.ai/v1/chat/completions.

Model (ID)	Aliases	Context	Max Output	Capabilities
`kimi-k2.6`	`kimi-k2-6`, `k2.6`, `k2-6`	256K tokens	131K tokens	🔧 Tools, 👁 Vision, 🧠 Thinking, 📋 JSON Mode
`kimi-k2.5`	`kimi-k2-5`, `k2.5`, `k2-5`	256K tokens	96K tokens	🔧 Tools, 👁 Vision, 🧠 Thinking, 📋 JSON Mode
`kimi-latest`	—	256K tokens	131K tokens	🔧 Tools, 👁 Vision, 🧠 Thinking, 📋 JSON Mode
`kimi-k2-turbo-preview`	`kimi-k2-turbo`	256K tokens	65K tokens	🔧 Tools, 📋 JSON Mode
`kimi-thinking-preview`	—	128K tokens	65K tokens	🔧 Tools, 🧠 Thinking, 📋 JSON Mode
`moonshot-v1-128k`	—	128K tokens	32K tokens	🔧 Tools, 📋 JSON Mode
`moonshot-v1-32k`	—	32K tokens	16K tokens	🔧 Tools, 📋 JSON Mode
`moonshot-v1-8k`	—	8K tokens	4K tokens	🔧 Tools, 📋 JSON Mode

Thinking mode: Set MOONSHOT_THINKING=enabled|disabled|auto to toggle between Thinking (explicit reasoning, default for K2.6/K2.5) and Instant (direct response, cheaper). The default auto lets the model choose; models without the thinking capability ignore the flag.

Public pricing (May 2026): kimi-k2.6 is

0.95/M input tokens (cache miss) and

4.00/M output. Cache-hit input drops to $0.16/M, but the ChatCLI cost tracker bills the cache-miss price for a conservative accounting.

Multi-provider API gateway — access 200+ models from OpenAI, Anthropic, Google, Meta, Mistral, DeepSeek, and more through a single API key. Uses an OpenAI-compatible API at https://openrouter.ai/api/v1/chat/completions.Models use the provider/model-name format:

Model (ID)	Provider	Capabilities
`openai/gpt-4o`	OpenAI	👁 Vision, 🔧 Tools
`openai/gpt-4o-mini`	OpenAI	👁 Vision, 🔧 Tools
`anthropic/claude-opus-5`	Anthropic	👁 Vision, 🔧 Tools, 📋 JSON
`anthropic/claude-sonnet-5`	Anthropic	👁 Vision, 🔧 Tools, 📋 JSON
`anthropic/claude-fable-5`	Anthropic	👁 Vision, 🔧 Tools, 📋 JSON
`anthropic/claude-sonnet-4`	Anthropic	👁 Vision, 🔧 Tools
`anthropic/claude-opus-4`	Anthropic	👁 Vision, 🔧 Tools
`google/gemini-2.5-pro`	Google	👁 Vision, 🔧 Tools, 📋 JSON
`google/gemini-2.5-flash`	Google	👁 Vision, 🔧 Tools, 📋 JSON
`meta-llama/llama-4-maverick`	Meta	🔧 Tools
`deepseek/deepseek-r1`	DeepSeek	🔧 Tools
`mistralai/mistral-large`	Mistral	🔧 Tools

The table above shows popular defaults. OpenRouter provides 200+ models — ChatCLI discovers them dynamically via the /api/v1/models endpoint. Use /switch --model to browse the full list.

OpenRouter supports native fallback routing via OPENROUTER_FALLBACK_MODELS. If your primary model is unavailable, OpenRouter automatically routes to the next model in the list — handled server-side before ChatCLI’s own fallback chain kicks in.

Served through the local Devin CLI wrapper — ChatCLI keeps the whole conversation and harness; Devin is only the transport. See Devin Provider. Slugs use dots (claude-sonnet-4.6, not 4-6).

Family	Models
Anthropic	`claude-opus-5` · `claude-sonnet-5` · `claude-opus-4.8` / `4.7` / `4.6` / `4.5` · `claude-sonnet-4.6` / `4.5` / `4` · `claude-haiku-4.5`
OpenAI	`gpt-5.6-sol` / `-terra` / `-luna` · `gpt-5.5` · `gpt-5.4` / `-mini` · `gpt-5.3-codex` · `gpt-5.2`
Google	`gemini-3.5-flash` · `gemini-3.1-pro` · `gemini-3-flash`
Others	`glm-5.2` · `kimi-k2.7` / `k2.6` · `deepseek-v4-pro`
Cognition (SWE)	`swe-1.7-lightning` · `swe-1.7` · `swe-1.6-fast` · `swe-1.6` · `swe-1.5`

Auth belongs to the binary (devin auth login, corporate SSO) — no key in ChatCLI. Default model: claude-sonnet-4.6 (DEVIN_MODEL). The CLI reports no token usage, so cost tracking shows zero — cost lives in the Cognition subscription.

Supports any local model via Ollama. Configure in .env:

OLLAMA_ENABLED=true
OLLAMA_MODEL="llama3"

Or switch interactively: /switch --model llama3Use ollama pull <model> to download new models.

How model selection works

ChatCLI determines which model to use with the following priority (highest to lowest):

--model flag on the command line: chatcli --model gpt-5.4
/switch command during a session: /switch --model claude-sonnet-4-6
MODEL environment variable: sets the default model
LLM_PROVIDER environment variable: determines the provider (openai, anthropic, google, xai, etc.)
Provider’s default model: each provider has a default model defined in the catalog

# Example: set provider and model via .env
LLM_PROVIDER=anthropic
MODEL=claude-sonnet-4-6

Model aliases

Each model has aliases for easier typing. ChatCLI automatically resolves aliases to the canonical model ID. For example:

Alias typed	Resolved model
`claude-4-5-sonnet`	`claude-sonnet-4-5`
`sonnet-4-5`	`claude-sonnet-4-5`
`opus-4-6`	`claude-opus-4-6`
`opus-4-7`	`claude-opus-4-7`
`opus-4-8`	`claude-opus-4-8`
`sonnet-4-7`	`claude-sonnet-4-7`
`sonnet-5`	`claude-sonnet-5`
`opus-5`	`claude-opus-5`
`fable`	`claude-fable-5`
`glm-5-2`	`glm-5.2`
`gpt-5.6`	`gpt-5.6-sol` (family flagship)
`gpt-5-mini`	`gpt-5` (mini variant)
`gemini-3-pro`	`gemini-3`

Aliases are defined in the model catalog and accepted in all contexts: --model, /switch, and the MODEL variable.

Catalog system

Models are registered in the llm/catalog package with complete metadata. ChatCLI uses the catalog to automatically determine:

API version — which endpoint and protocol version to use for each model
Max tokens — context and output limits for managing prompts and responses
Capabilities — which features are available (vision, tools, JSON mode, etc.)
Provider-specific headers — for example, the anthropic-version header varies per model

This means that when switching models, ChatCLI automatically adjusts all request parameters without manual configuration.

Dynamic model listing

ChatCLI fetches available models directly from each provider’s API, using the configured token or API key. This ensures you see exactly which models your account has access to — including new models not yet in the static catalog.

How it works

When ChatCLI starts or when you switch providers (via /switch, /auth login, etc.), a background request queries the active provider’s models endpoint
Discovered models are cached for use in the /switch --model autocomplete
Each suggestion indicates its origin: [API] (dynamic) or [catalog] (static)

Endpoints per provider

Provider	Endpoint	Auth
OpenAI	`GET /v1/models`	API Key or OAuth
Anthropic	`GET /v1/models`	API Key or OAuth
Google AI	`GET /v1beta/models`	API Key
xAI	`GET /v1/models`	API Key
GitHub Copilot	`GET /models`	OAuth (Device Flow)
Ollama	`GET /api/tags`	No auth (local)
ZAI (Zhipu AI)	`GET /models`	API Key
MiniMax	`GET /models`	API Key
Moonshot (Kimi)	`GET /v1/models`	API Key (Bearer)
OpenRouter	`GET /api/v1/models`	API Key
StackSpot	—	Not supported (model fixed per agent)

Smart autocomplete

When typing /switch --model and pressing Tab, ChatCLI suggests available models:

> /switch --model [Tab]
  gpt-4o           GPT-4o (Copilot) [API]
  claude-sonnet-4  Claude Sonnet 4 (Copilot) [API]
  o4-mini          o4-mini (Copilot) [API]

If the API is not reachable, it falls back to the static catalog:

> /switch --model [Tab]
  gpt-4o           GPT-4o (Copilot) [catalog]
  gpt-4o-mini      GPT-4o mini (Copilot) [catalog]

Pressing Enter with /switch --model (no value) lists all available models with source indication (API or catalog).

OAuth and dynamic listing

Dynamic listing works with both API key and OAuth:

Anthropic OAuth: uses ?beta=true and Chrome-like headers, with automatic gzip decompression
OpenAI OAuth: queries the ChatGPT backend (/backend-api/models) instead of the standard endpoint
GitHub Copilot OAuth: uses the Device Flow token to query api.githubcopilot.com/models

After an /auth login, the model cache is automatically refreshed to reflect the new provider.

Anthropic API versioning

Claude models may use different anthropic-version header values in API requests. The catalog manages this automatically:

Newer models (claude-fable-5, claude-sonnet-5, claude-opus-4-8, claude-opus-4-7, claude-sonnet-4-6) use the latest API version
Legacy models (claude-3-opus, claude-3-haiku) may use older versions for compatibility
ChatCLI sends the correct header for each model without any user intervention

​Claude Opus 4.8 — what’s new

​How model selection works

​Model aliases

​Catalog system

​Dynamic model listing

​How it works

​Endpoints per provider

​Smart autocomplete

​OAuth and dynamic listing

​Anthropic API versioning

Claude Opus 4.8 — what’s new

How model selection works

Model aliases

Catalog system

Dynamic model listing

How it works

Endpoints per provider

Smart autocomplete

OAuth and dynamic listing

Anthropic API versioning