LLM Models Reference

Pricing per 1M Tokens

All providers offer ~90% discount on cached/repeated input. Arena = LMArena ELO (Feb 2026).

Model	Provider	Input	Cached	Output	Arena	Cache Mechanism
Gemini 2.5 Flash Lite	Google	$0.10	$0.01	$0.40	—	Explicit, 90% off
Grok 4.1 Fast	xAI	$0.20	$0.05	$0.50	1475	Auto 99% hit 75% off
GPT-5-mini	OpenAI	$0.25	$0.025	$2.00	<1400	Automatic, 90% off
Gemini 3 Flash	Google	$0.50	free*	$2.00	1473	Implicit, *free during preview
Haiku 4.5	Anthropic	$1.00	$0.10	$5.00	1404	Explicit `cache_control`, min 4096
GPT-5.2	OpenAI	$1.75	$0.18	$14.00	1438	Automatic, 90% off
Gemini 3 Pro	Google	$2.00	$0.20	$12.00	1486	Explicit, 90% off
Sonnet 4.6	Anthropic	$3.00	$0.30	$15.00	TBD	New Explicit, min 1024
Opus 4.6	Anthropic	$5.00	$0.50	$25.00	1502	Explicit `cache_control`, min 4096
GPT-5.2 Pro	OpenAI	$21.00	$2.10	$168.00	—	Expensive Auto, 90% off

Model

Provider

Input

Cached

Output

Arena

Cache Mechanism

Gemini 2.5 Flash Lite

Google

$0.10

$0.01

$0.40

—

Explicit, 90% off

Grok 4.1 Fast

xAI

$0.20

$0.05

$0.50

1475

Auto 99% hit 75% off

GPT-5-mini

OpenAI

$0.25

$0.025

$2.00

<1400

Automatic, 90% off

Gemini 3 Flash

Google

$0.50

free*

$2.00

1473

Implicit, *free during preview

Haiku 4.5

Anthropic

$1.00

$0.10

$5.00

1404

Explicit cache_control, min 4096

GPT-5.2

OpenAI

$1.75

$0.18

$14.00

1438

Automatic, 90% off

Gemini 3 Pro

Google

$2.00

$0.20

$12.00

1486

Explicit, 90% off

Sonnet 4.6

Anthropic

$3.00

$0.30

$15.00

TBD

New Explicit, min 1024

Opus 4.6

Anthropic

$5.00

$0.50

$25.00

1502

Explicit cache_control, min 4096

GPT-5.2 Pro

OpenAI

$21.00

$2.10

$168.00

—

Expensive Auto, 90% off

Latency Benchmarks

Model	TTFT	Output (tok/s)	Notes
Haiku 4.5	0.49s	109	Fastest Anthropic
GPT-5.2 (non-reasoning)	0.55s	~37	Great TTFT, low throughput
Grok 4.1 Fast (non-reason)	0.71s	129	Strong all-round
Sonnet 4.6	0.85s	59	New
Opus 4.6	1.1–1.7s	59–71	Varies by provider
Opus 4.6 (adaptive)	1.78s	73	Reasoning, still low TTFT
Grok 4.1 Fast (reasoning)	~3s	~100	Moderate reasoning overhead
Gemini 3 Flash (reasoning)	11.4s	211	#2 throughput of 113 models
Gemini 3 Pro (high)	30.3s	125	Heavy reasoning overhead
GPT-5-mini (high/reason)	73.7s	111	High TTFT from reasoning
GPT-5 (high/reasoning)	107s	98	Massive TTFT from thinking

Model

TTFT

Output (tok/s)

Speed

Notes

Haiku 4.5

0.49s

109

Fastest Anthropic

GPT-5.2 (non-reasoning)

0.55s

~37

Great TTFT, low throughput

Grok 4.1 Fast (non-reason)

0.71s

129

Strong all-round

Sonnet 4.6

0.85s

New

Opus 4.6

1.1–1.7s

59–71

Varies by provider

Opus 4.6 (adaptive)

1.78s

Reasoning, still low TTFT

Grok 4.1 Fast (reasoning)

~3s

~100

Moderate reasoning overhead

Gemini 3 Flash (reasoning)

11.4s

211

#2 throughput of 113 models

Gemini 3 Pro (high)

30.3s

125

Heavy reasoning overhead

GPT-5-mini (high/reason)

73.7s

111

High TTFT from reasoning

GPT-5 (high/reasoning)

107s

Massive TTFT from thinking

Model IDs & Aliases

Alias	LiteLLM ID	Direct API ID	Cache Min
`opus`	`anthropic/claude-opus-4-6`	`claude-opus-4-6`	4096
`sonnet`	`anthropic/claude-sonnet-4-6`	`claude-sonnet-4-6`	1024
`haiku`	`anthropic/claude-haiku-4-5-20251001`	`claude-haiku-4-5-20251001`	4096
`gpt`	`openai/gpt-5.2`	`gpt-5.2`	~1024
`gpt-mini`	`openai/gpt-5-mini`	`gpt-5-mini`	~1024
`gemini`	`gemini/gemini-3-pro-preview`	`gemini-3-pro-preview`	1024
`gemini-flash`	`gemini/gemini-3-flash-preview`	`gemini-3-flash-preview`	1024
`grok`	`xai/grok-4-1-fast-reasoning`	`grok-4-1-fast-reasoning`	auto
`grok-code`	`xai/grok-code-fast-1`	`grok-code-fast-1`	auto
`codex`	`openai/gpt-5.2-codex`	`gpt-5.2-codex`	~1024

Alias

LiteLLM ID

Direct API ID

Cache Min

opus

anthropic/claude-opus-4-6

claude-opus-4-6

4096

sonnet

anthropic/claude-sonnet-4-6

claude-sonnet-4-6

1024

haiku

anthropic/claude-haiku-4-5-20251001

claude-haiku-4-5-20251001

4096

gpt

openai/gpt-5.2

gpt-5.2

~1024

gpt-mini

openai/gpt-5-mini

gpt-5-mini

~1024

gemini

gemini/gemini-3-pro-preview

gemini-3-pro-preview

1024

gemini-flash

gemini/gemini-3-flash-preview

gemini-3-flash-preview

1024

grok

xai/grok-4-1-fast-reasoning

grok-4-1-fast-reasoning

auto

grok-code

xai/grok-code-fast-1

grok-code-fast-1

auto

codex

openai/gpt-5.2-codex

gpt-5.2-codex

~1024

Coming Soon — Announced, Not Yet in Our API

Models announced by providers but not yet accessible via our API routes. Check timestamps for freshness.

Model	Provider	Status	Replaces	Pricing (in/out)	Key Benchmarks	Try Now	Last Checked
Gemini 3.1 Pro `gemini-3.1-pro-preview`	Google	API works subscription 404	`gemini-3-pro-preview`	$2.00 / $12.00	ARC-AGI-2: 77.1% (vs 31%) SWE-Bench: 80.6% GPQA: 94.3%	AI Studio	2026-02-19 10:09 PT
GPT-5.3 Codex `gpt-5.3-codex`	OpenAI	Subscription works no standard API	`gpt-5.2-codex`	TBD	25% faster than 5.2 Tested via codex_oauth.py Standard API delayed (security)	Codex app, CLI	2026-02-19 10:08 PT
GPT-5.3 Codex Spark `gpt-5.3-codex-spark`	OpenAI	Pro only needs $200/mo plan	—	TBD	Smaller, real-time coding First streaming Codex model Research preview	Codex app	2026-02-19 09:53 PT
Grok Code v2 `grok-code-fast-2?`	xAI	In training	`grok-code-fast-1`	TBD	Multimodal inputs Parallel tool calling Extended context	N/A	2026-02-19 09:53 PT

Model

Provider

Status

Replaces

Pricing (in/out)

Key Benchmarks

Try Now

Last Checked

Gemini 3.1 Pro
gemini-3.1-pro-preview

Google

API works
subscription 404

gemini-3-pro-preview

$2.00 / $12.00

ARC-AGI-2: 77.1% (vs 31%)
SWE-Bench: 80.6%
GPQA: 94.3%

AI Studio

2026-02-19
10:09 PT

GPT-5.3 Codex
gpt-5.3-codex

OpenAI

Subscription works
no standard API

gpt-5.2-codex

TBD

25% faster than 5.2
Tested via codex_oauth.py
Standard API delayed (security)

Codex app, CLI

2026-02-19
10:08 PT

GPT-5.3 Codex Spark
gpt-5.3-codex-spark

OpenAI

Pro only
needs $200/mo plan

—

TBD

Smaller, real-time coding
First streaming Codex model
Research preview

Codex app

2026-02-19
09:53 PT

Grok Code v2
grok-code-fast-2?

xAI

In training

grok-code-fast-1

TBD

Multimodal inputs
Parallel tool calling
Extended context

N/A

2026-02-19
09:53 PT

Launch Posts

Model	Date	Announcement
Gemini 3.1 Pro	2026-02-19	blog.google/.../gemini-3-1-pro
Sonnet 4.6	2026-02-17	anthropic.com/news/claude-sonnet-4-6
Opus 4.6	2026-02-05	anthropic.com/news/claude-opus-4-6
Haiku 4.5	2025-10	anthropic.com/news/claude-haiku-4-5
Sonnet 4.5	2025-09	anthropic.com/news/claude-sonnet-4-5
GPT-5.2	2025-12-11	openai.com/index/introducing-gpt-5-2
GPT-5.2 Codex	2025-12	openai.com/index/introducing-gpt-5-2-codex
GPT-5.3 Codex	2026-02-05	openai.com/index/introducing-gpt-5-3-codex
GPT-5.3 Codex Spark	2026-02	openai.com/index/introducing-gpt-5-3-codex-spark
Gemini 3 Pro	2025-11	blog.google/products/gemini/gemini-3
Gemini 3 Flash	2025-12-17	blog.google/products/gemini/gemini-3-flash
Grok 4.1	2025-11-17	x.ai/news/grok-4-1
Grok 4.1 Fast	2025-11	x.ai/news/grok-4-1-fast
Grok Code Fast 1	2025-08	x.ai/news/grok-code-fast-1

Model

Date

Announcement

Gemini 3.1 Pro

2026-02-19

blog.google/.../gemini-3-1-pro

Sonnet 4.6

2026-02-17

anthropic.com/news/claude-sonnet-4-6

Opus 4.6

2026-02-05

anthropic.com/news/claude-opus-4-6