LLM Models Reference

Updated 2026-02-19 — Pricing, latency, launch posts, arenas, coming soon

Pricing per 1M Tokens

All providers offer ~90% discount on cached/repeated input. Arena = LMArena ELO (Feb 2026).

Model Provider Input Cached Output Arena Cache Mechanism
Gemini 2.5 Flash Lite Google $0.10 $0.01 $0.40 Explicit, 90% off
Grok 4.1 Fast xAI $0.20 $0.05 $0.50 1475 Auto 99% hit 75% off
GPT-5-mini OpenAI $0.25 $0.025 $2.00 <1400 Automatic, 90% off
Gemini 3 Flash Google $0.50 free* $2.00 1473 Implicit, *free during preview
Haiku 4.5 Anthropic $1.00 $0.10 $5.00 1404 Explicit cache_control, min 4096
GPT-5.2 OpenAI $1.75 $0.18 $14.00 1438 Automatic, 90% off
Gemini 3 Pro Google $2.00 $0.20 $12.00 1486 Explicit, 90% off
Sonnet 4.6 Anthropic $3.00 $0.30 $15.00 TBD New Explicit, min 1024
Opus 4.6 Anthropic $5.00 $0.50 $25.00 1502 Explicit cache_control, min 4096
GPT-5.2 Pro OpenAI $21.00 $2.10 $168.00 Expensive Auto, 90% off

Latency Benchmarks

Source: Artificial Analysis (rolling averages, Feb 2026)

Model TTFT Output (tok/s) Speed Notes
Haiku 4.5 0.49s 109 Fastest Anthropic
GPT-5.2 (non-reasoning) 0.55s ~37 Great TTFT, low throughput
Grok 4.1 Fast (non-reason) 0.71s 129 Strong all-round
Sonnet 4.6 0.85s 59 New
Opus 4.6 1.1–1.7s 59–71 Varies by provider
Opus 4.6 (adaptive) 1.78s 73 Reasoning, still low TTFT
Grok 4.1 Fast (reasoning) ~3s ~100 Moderate reasoning overhead
Gemini 3 Flash (reasoning) 11.4s 211 #2 throughput of 113 models
Gemini 3 Pro (high) 30.3s 125 Heavy reasoning overhead
GPT-5-mini (high/reason) 73.7s 111 High TTFT from reasoning
GPT-5 (high/reasoning) 107s 98 Massive TTFT from thinking

Model IDs & Aliases

Alias LiteLLM ID Direct API ID Cache Min
opus anthropic/claude-opus-4-6 claude-opus-4-6 4096
sonnet anthropic/claude-sonnet-4-6 claude-sonnet-4-6 1024
haiku anthropic/claude-haiku-4-5-20251001 claude-haiku-4-5-20251001 4096
gpt openai/gpt-5.2 gpt-5.2 ~1024
gpt-mini openai/gpt-5-mini gpt-5-mini ~1024
gemini gemini/gemini-3-pro-preview gemini-3-pro-preview 1024
gemini-flash gemini/gemini-3-flash-preview gemini-3-flash-preview 1024
grok xai/grok-4-1-fast-reasoning grok-4-1-fast-reasoning auto
grok-code xai/grok-code-fast-1 grok-code-fast-1 auto
codex openai/gpt-5.2-codex gpt-5.2-codex ~1024

Coming Soon — Announced, Not Yet in Our API

Models announced by providers but not yet accessible via our API routes. Check timestamps for freshness.

Model Provider Status Replaces Pricing (in/out) Key Benchmarks Try Now Last Checked
Gemini 3.1 Pro
gemini-3.1-pro-preview
Google API works
subscription 404
gemini-3-pro-preview $2.00 / $12.00 ARC-AGI-2: 77.1% (vs 31%)
SWE-Bench: 80.6%
GPQA: 94.3%
AI Studio 2026-02-19
10:09 PT
GPT-5.3 Codex
gpt-5.3-codex
OpenAI Subscription works
no standard API
gpt-5.2-codex TBD 25% faster than 5.2
Tested via codex_oauth.py
Standard API delayed (security)
Codex app, CLI 2026-02-19
10:08 PT
GPT-5.3 Codex Spark
gpt-5.3-codex-spark
OpenAI Pro only
needs $200/mo plan
TBD Smaller, real-time coding
First streaming Codex model
Research preview
Codex app 2026-02-19
09:53 PT
Grok Code v2
grok-code-fast-2?
xAI In training grok-code-fast-1 TBD Multimodal inputs
Parallel tool calling
Extended context
N/A 2026-02-19
09:53 PT

Launch Posts

Model Date Announcement
Gemini 3.1 Pro 2026-02-19 blog.google/.../gemini-3-1-pro
Sonnet 4.6 2026-02-17 anthropic.com/news/claude-sonnet-4-6
Opus 4.6 2026-02-05 anthropic.com/news/claude-opus-4-6
Haiku 4.5 2025-10 anthropic.com/news/claude-haiku-4-5
Sonnet 4.5 2025-09 anthropic.com/news/claude-sonnet-4-5
GPT-5.2 2025-12-11 openai.com/index/introducing-gpt-5-2
GPT-5.2 Codex 2025-12 openai.com/index/introducing-gpt-5-2-codex
GPT-5.3 Codex 2026-02-05 openai.com/index/introducing-gpt-5-3-codex
GPT-5.3 Codex Spark 2026-02 openai.com/index/introducing-gpt-5-3-codex-spark
Gemini 3 Pro 2025-11 blog.google/products/gemini/gemini-3
Gemini 3 Flash 2025-12-17 blog.google/products/gemini/gemini-3-flash
Grok 4.1 2025-11-17 x.ai/news/grok-4-1
Grok 4.1 Fast 2025-11 x.ai/news/grok-4-1-fast
Grok Code Fast 1 2025-08 x.ai/news/grok-code-fast-1

Arenas & Leaderboards

LMArena (Chatbot Arena)

Gold standard for human-preference ELO ratings. Blind side-by-side comparisons.

Artificial Analysis

Speed, cost, and quality tradeoffs. API latency benchmarks across providers.

LiveBench

Continually updated questions to prevent contamination. Monthly refreshes.

SWE-bench

Real-world software engineering — GitHub issue resolution from popular repos.

Copilot Arena

VS Code extension blind coding comparisons. Real developer tasks.

Aider Code Editing

Code editing capability with diff formats. Measures practical edit accuracy.

OpenRouter Rankings

Real usage and popularity across their routing platform.

SEAL Leaderboards

Scale AI's expert-driven evaluations across multiple domains.

WebDev Arena

Web development-specific model comparisons.