Updated 2026-02-19 — Pricing, latency, launch posts, arenas, coming soon
All providers offer ~90% discount on cached/repeated input. Arena = LMArena ELO (Feb 2026).
| Model | Provider | Input | Cached | Output | Arena | Cache Mechanism |
|---|---|---|---|---|---|---|
| Gemini 2.5 Flash Lite | $0.10 | $0.01 | $0.40 | — | Explicit, 90% off | |
| Grok 4.1 Fast | xAI | $0.20 | $0.05 | $0.50 | 1475 | Auto 99% hit 75% off |
| GPT-5-mini | OpenAI | $0.25 | $0.025 | $2.00 | <1400 | Automatic, 90% off |
| Gemini 3 Flash | $0.50 | free* | $2.00 | 1473 | Implicit, *free during preview | |
| Haiku 4.5 | Anthropic | $1.00 | $0.10 | $5.00 | 1404 | Explicit cache_control, min 4096 |
| GPT-5.2 | OpenAI | $1.75 | $0.18 | $14.00 | 1438 | Automatic, 90% off |
| Gemini 3 Pro | $2.00 | $0.20 | $12.00 | 1486 | Explicit, 90% off | |
| Sonnet 4.6 | Anthropic | $3.00 | $0.30 | $15.00 | TBD | New Explicit, min 1024 |
| Opus 4.6 | Anthropic | $5.00 | $0.50 | $25.00 | 1502 | Explicit cache_control, min 4096 |
| GPT-5.2 Pro | OpenAI | $21.00 | $2.10 | $168.00 | — | Expensive Auto, 90% off |
Source: Artificial Analysis (rolling averages, Feb 2026)
| Model | TTFT | Output (tok/s) | Speed | Notes |
|---|---|---|---|---|
| Haiku 4.5 | 0.49s | 109 | Fastest Anthropic | |
| GPT-5.2 (non-reasoning) | 0.55s | ~37 | Great TTFT, low throughput | |
| Grok 4.1 Fast (non-reason) | 0.71s | 129 | Strong all-round | |
| Sonnet 4.6 | 0.85s | 59 | New | |
| Opus 4.6 | 1.1–1.7s | 59–71 | Varies by provider | |
| Opus 4.6 (adaptive) | 1.78s | 73 | Reasoning, still low TTFT | |
| Grok 4.1 Fast (reasoning) | ~3s | ~100 | Moderate reasoning overhead | |
| Gemini 3 Flash (reasoning) | 11.4s | 211 | #2 throughput of 113 models | |
| Gemini 3 Pro (high) | 30.3s | 125 | Heavy reasoning overhead | |
| GPT-5-mini (high/reason) | 73.7s | 111 | High TTFT from reasoning | |
| GPT-5 (high/reasoning) | 107s | 98 | Massive TTFT from thinking |
| Alias | LiteLLM ID | Direct API ID | Cache Min |
|---|---|---|---|
opus |
anthropic/claude-opus-4-6 |
claude-opus-4-6 |
4096 |
sonnet |
anthropic/claude-sonnet-4-6 |
claude-sonnet-4-6 |
1024 |
haiku |
anthropic/claude-haiku-4-5-20251001 |
claude-haiku-4-5-20251001 |
4096 |
gpt |
openai/gpt-5.2 |
gpt-5.2 |
~1024 |
gpt-mini |
openai/gpt-5-mini |
gpt-5-mini |
~1024 |
gemini |
gemini/gemini-3-pro-preview |
gemini-3-pro-preview |
1024 |
gemini-flash |
gemini/gemini-3-flash-preview |
gemini-3-flash-preview |
1024 |
grok |
xai/grok-4-1-fast-reasoning |
grok-4-1-fast-reasoning |
auto |
grok-code |
xai/grok-code-fast-1 |
grok-code-fast-1 |
auto |
codex |
openai/gpt-5.2-codex |
gpt-5.2-codex |
~1024 |
Models announced by providers but not yet accessible via our API routes. Check timestamps for freshness.
| Model | Provider | Status | Replaces | Pricing (in/out) | Key Benchmarks | Try Now | Last Checked |
|---|---|---|---|---|---|---|---|
Gemini 3.1 Progemini-3.1-pro-preview |
API works subscription 404 |
gemini-3-pro-preview |
$2.00 / $12.00 | ARC-AGI-2: 77.1% (vs 31%) SWE-Bench: 80.6% GPQA: 94.3% |
AI Studio | 2026-02-19 10:09 PT |
|
GPT-5.3 Codexgpt-5.3-codex |
OpenAI | Subscription works no standard API |
gpt-5.2-codex |
TBD | 25% faster than 5.2 Tested via codex_oauth.py Standard API delayed (security) |
Codex app, CLI | 2026-02-19 10:08 PT |
GPT-5.3 Codex Sparkgpt-5.3-codex-spark |
OpenAI | Pro only needs $200/mo plan |
— | TBD | Smaller, real-time coding First streaming Codex model Research preview |
Codex app | 2026-02-19 09:53 PT |
Grok Code v2grok-code-fast-2? |
xAI | In training | grok-code-fast-1 |
TBD | Multimodal inputs Parallel tool calling Extended context |
N/A | 2026-02-19 09:53 PT |
| Model | Date | Announcement |
|---|---|---|
| Gemini 3.1 Pro | 2026-02-19 | blog.google/.../gemini-3-1-pro |
| Sonnet 4.6 | 2026-02-17 | anthropic.com/news/claude-sonnet-4-6 |
| Opus 4.6 | 2026-02-05 | anthropic.com/news/claude-opus-4-6 |
| Haiku 4.5 | 2025-10 | anthropic.com/news/claude-haiku-4-5 |
| Sonnet 4.5 | 2025-09 | anthropic.com/news/claude-sonnet-4-5 |
| GPT-5.2 | 2025-12-11 | openai.com/index/introducing-gpt-5-2 |
| GPT-5.2 Codex | 2025-12 | openai.com/index/introducing-gpt-5-2-codex |
| GPT-5.3 Codex | 2026-02-05 | openai.com/index/introducing-gpt-5-3-codex |
| GPT-5.3 Codex Spark | 2026-02 | openai.com/index/introducing-gpt-5-3-codex-spark |
| Gemini 3 Pro | 2025-11 | blog.google/products/gemini/gemini-3 |
| Gemini 3 Flash | 2025-12-17 | blog.google/products/gemini/gemini-3-flash |
| Grok 4.1 | 2025-11-17 | x.ai/news/grok-4-1 |
| Grok 4.1 Fast | 2025-11 | x.ai/news/grok-4-1-fast |
| Grok Code Fast 1 | 2025-08 | x.ai/news/grok-code-fast-1 |
Gold standard for human-preference ELO ratings. Blind side-by-side comparisons.
Speed, cost, and quality tradeoffs. API latency benchmarks across providers.
Continually updated questions to prevent contamination. Monthly refreshes.
Real-world software engineering — GitHub issue resolution from popular repos.
VS Code extension blind coding comparisons. Real developer tasks.
Code editing capability with diff formats. Measures practical edit accuracy.
Real usage and popularity across their routing platform.
Scale AI's expert-driven evaluations across multiple domains.
Web development-specific model comparisons.