# LLM Latency Benchmark Report

**Generated**: 2026-01-27 05:26:30

## Summary

| Model | TTFT (median) | Total (median) | Tokens/sec | Success |
|-------|---------------|----------------|------------|---------|
| openai/gpt-5-nano | 623ms | 4073ms | 88.5 | 100% |
| openai/gpt-5-mini | 937ms | 4902ms | 76.1 | 100% |

## Detailed Results

### openai/gpt-5-nano

**TTFT (Time to First Token)**
- Min: 553ms
- Max: 694ms
- Mean: 623ms
- Median: 623ms
- Stdev: 99ms

**Total Response Time**
- Min: 3546ms
- Max: 4601ms
- Mean: 4073ms
- Median: 4073ms
- Stdev: 746ms

**Individual Runs**

- Run 1: TTFT=553ms, Total=3546ms, Tokens=300, 100.2 tok/s
- Run 2: TTFT=694ms, Total=4601ms, Tokens=300, 76.8 tok/s

### openai/gpt-5-mini

**TTFT (Time to First Token)**
- Min: 546ms
- Max: 1329ms
- Mean: 937ms
- Median: 937ms
- Stdev: 553ms

**Total Response Time**
- Min: 4822ms
- Max: 4982ms
- Mean: 4902ms
- Median: 4902ms
- Stdev: 113ms

**Individual Runs**

- Run 1: TTFT=1329ms, Total=4982ms, Tokens=300, 82.1 tok/s
- Run 2: TTFT=546ms, Total=4822ms, Tokens=300, 70.2 tok/s

## Configuration

- **Prompt**: "Explain how a CPU cache works in 3 paragraphs."
- **Max Tokens**: 300
- **Timeout**: 60s
