Skip to content
THE AI RACE
Leaderboard
Compare
Benchmarks
Methodology
Changelog
Movers
Time Machine
⌘K
☰
THE AI RACE
Leaderboard
Compare
Benchmarks
Methodology
Changelog
Movers
Time Machine
⌘K
☰
THE AI RACE
Leaderboard
Compare
Benchmarks
Methodology
Changelog
Movers
Time Machine
⌘K
☰
← Back to Benchmarks
HumanEval+
Functional code correctness from docstrings (164 problems)
Coding
Unit: %
Max: 100
Source →
Rankings (10 organizations)
1
OpenAI
89%
2
Alibaba Qwen
87.2%
3
DeepSeek
86.6%
4
Meta AI
85%
5
xAI
83%
6
Google DeepMind
79.3%
7
Anthropic
77.4%
8
Mistral
73.8%
9
Cohere
72%
10
Zhipu AI
68%
Other Benchmarks in Coding
SWE-bench Verified
LiveCodeBench
Aider Polyglot
BigCodeBench
All Categories
Language & Knowledge
Coding
Reasoning & Math
Image Generation
Video Generation
Multimodal
Agents & Tools