# LLM Latency Benchmark Report

**Generated**: 2026-01-27 05:21:44

## Summary

| Model | TTFT (median) | Total (median) | Tokens/sec | Success |
|-------|---------------|----------------|------------|---------|
| anthropic/claude-haiku-4-5-20251001 | 605ms | 3908ms | 91.1 | 100% |
| gemini/gemini-2.5-flash-lite | 755ms | 2948ms | 133.9 | 100% |
| openai/gpt-5-mini | 5415ms | 5921ms | 608.3 | 100% |
| openai/gpt-5-nano | N/A | N/A | N/A | 0% |

## Detailed Results

### anthropic/claude-haiku-4-5-20251001

**TTFT (Time to First Token)**
- Min: 499ms
- Max: 711ms
- Mean: 605ms
- Median: 605ms
- Stdev: 150ms

**Total Response Time**
- Min: 3616ms
- Max: 4200ms
- Mean: 3908ms
- Median: 3908ms
- Stdev: 413ms

**Individual Runs**

- Run 1: TTFT=711ms, Total=4200ms, Tokens=300, 86.0 tok/s
- Run 2: TTFT=499ms, Total=3616ms, Tokens=300, 96.2 tok/s

### gemini/gemini-2.5-flash-lite

**TTFT (Time to First Token)**
- Min: 521ms
- Max: 989ms
- Mean: 755ms
- Median: 755ms
- Stdev: 330ms

**Total Response Time**
- Min: 2218ms
- Max: 3677ms
- Mean: 2948ms
- Median: 2948ms
- Stdev: 1032ms

**Individual Runs**

- Run 1: TTFT=521ms, Total=2218ms, Tokens=265, 156.2 tok/s
- Run 2: TTFT=989ms, Total=3677ms, Tokens=300, 111.6 tok/s

### openai/gpt-5-mini

**TTFT (Time to First Token)**
- Min: 4307ms
- Max: 6523ms
- Mean: 5415ms
- Median: 5415ms
- Stdev: 1567ms

**Total Response Time**
- Min: 4892ms
- Max: 6950ms
- Mean: 5921ms
- Median: 5921ms
- Stdev: 1455ms

**Individual Runs**

- Run 1: TTFT=4307ms, Total=4892ms, Tokens=300, 512.7 tok/s
- Run 2: TTFT=6523ms, Total=6950ms, Tokens=300, 703.8 tok/s

### openai/gpt-5-nano

**All runs failed**

**Individual Runs**

- Run 1: FAILED - litellm.UnsupportedParamsError: gpt-5 models (including gpt-5-codex) don't support temperature=0.7. Only temperature=1 is supported. For gpt-5.1, temperature is supported when reasoning_effort='none' 
- Run 2: FAILED - litellm.UnsupportedParamsError: gpt-5 models (including gpt-5-codex) don't support temperature=0.7. Only temperature=1 is supported. For gpt-5.1, temperature is supported when reasoning_effort='none' 

## Configuration

- **Prompt**: "Explain how a CPU cache works in 3 paragraphs."
- **Max Tokens**: 300
- **Timeout**: 60s
