AI Model Leaderboard & Pricing

Token Cost Calculator AI Model Leaderboard

Compare the top AI models by performance and cost. Last updated: May 30, 2026. Sources: LMSYS Chatbot Arena, HuggingFace Open LLM Leaderboard, official benchmarks.

#	Model	MMLU	HumanEval	Arena Elo	Price / 1M In	Category
1	o3 OpenAI	96.7%	97.9%	1350	$10	reasoning
2	Gemini 2.5 Pro Google	91.0%	84.1%	1320	$1.25	generalvisionreasoning
3	o4-mini OpenAI	93.3%	92.4%	1310	$1.1	reasoningcoding
4	Claude Opus 4.6 Anthropic	90.4%	84.9%	1290	$15	generalreasoningvision
5	GPT-4o OpenAI	88.7%	90.2%	1285	$2.5	generalvisioncoding
6	Claude Sonnet 4.6 Anthropic	88.7%	73.0%	1258	$3	generalcoding
7	Gemini 2.5 Flash Google	86.2%	74.3%	1240	$0.075	generalcoding
8	DeepSeek V3Open DeepSeek	87.1%	89.1%	1230	$0.27	generalcoding
9	Llama 3.3 70BOpen Meta	86.0%	88.4%	1220	Free	generalcoding
10	Qwen2.5 72BOpen Alibaba	86.7%	86.7%	1210	Free	generalcoding
11	GPT-4o mini OpenAI	82.0%	87.2%	1200	$0.15	generalcoding
12	Mistral Large 2 Mistral	84.0%	92.1%	1195	$2	generalcoding
13	Claude Haiku 4.5 Anthropic	75.2%	60.0%	1180	$0.8	generalcoding
14	Llama 4 ScoutOpen Meta	79.6%	72.6%	1180	Free	general

Metrics Glossary

MMLUMassive Multitask Language Understanding. Measures general knowledge and problem-solving across 57 subjects.
HumanEvalCoding benchmark measuring functional correctness for synthesizing programs (pass@1).
Arena EloLMSYS Chatbot Arena ELO score based on crowdsourced, blind human preference testing.
PricingPrices shown are in USD per 1 Million input tokens. Free models are highlighted in green.