Compare the top AI models by performance and cost. Last updated: May 30, 2026. Sources: LMSYS Chatbot Arena, HuggingFace Open LLM Leaderboard, official benchmarks.

#ModelMMLUHumanEvalArena EloPrice / 1M InCategory
1
o3
OpenAI
96.7%
97.9%
1350
$10
reasoning
2
Gemini 2.5 Pro
Google
91.0%
84.1%
1320
$1.25
generalvisionreasoning
3
o4-mini
OpenAI
93.3%
92.4%
1310
$1.1
reasoningcoding
4
Claude Opus 4.6
Anthropic
90.4%
84.9%
1290
$15
generalreasoningvision
5
GPT-4o
OpenAI
88.7%
90.2%
1285
$2.5
generalvisioncoding
6
Claude Sonnet 4.6
Anthropic
88.7%
73.0%
1258
$3
generalcoding
7
Gemini 2.5 Flash
Google
86.2%
74.3%
1240
$0.075
generalcoding
8
DeepSeek V3Open
DeepSeek
87.1%
89.1%
1230
$0.27
generalcoding
9
Llama 3.3 70BOpen
Meta
86.0%
88.4%
1220
Free
generalcoding
10
Qwen2.5 72BOpen
Alibaba
86.7%
86.7%
1210
Free
generalcoding
11
GPT-4o mini
OpenAI
82.0%
87.2%
1200
$0.15
generalcoding
12
Mistral Large 2
Mistral
84.0%
92.1%
1195
$2
generalcoding
13
Claude Haiku 4.5
Anthropic
75.2%
60.0%
1180
$0.8
generalcoding
14
Llama 4 ScoutOpen
Meta
79.6%
72.6%
1180
Free
general

Metrics Glossary

  • MMLUMassive Multitask Language Understanding. Measures general knowledge and problem-solving across 57 subjects.
  • HumanEvalCoding benchmark measuring functional correctness for synthesizing programs (pass@1).
  • Arena EloLMSYS Chatbot Arena ELO score based on crowdsourced, blind human preference testing.
  • PricingPrices shown are in USD per 1 Million input tokens. Free models are highlighted in green.