Methodology
How we assess and rank AI organizations in the global AI race.
Overview
The AI Race scores organizations across six key pillars, each capturing a different dimension of AI leadership. Scores are grounded in real benchmark data — we track 24 public benchmarks across 7 categories, automatically ingested daily from authoritative sources. This is supplemented by product launches, partnerships, research papers, and market adoption metrics. Each pillar is scored from 0 to 100.
Six Pillars of Assessment
Raw model performance across 24 public benchmarks spanning text, coding, reasoning, image, video, multimodal, and agent tasks. Computed from normalized, weighted benchmark scores ingested daily.
Speed of innovation — release cadence, time-to-ship improvements, research output, and rate of capability gains over time.
Market traction including user counts, API usage, enterprise customers, developer ecosystem size, and revenue where available.
Training and inference efficiency, access to compute resources, hardware partnerships, and cost-effectiveness of serving models.
Breadth of partnerships, integrations, platform ecosystem, developer tools, and strategic alliances that amplify reach.
Safety track record, responsible AI practices, transparency, regulatory compliance, access policies, and open-source contributions.
Benchmark Data
We track 24 public benchmarks across 7 categories. Each benchmark is normalized to a 0–100 scale and weighted within its category. ELO-based scores use a linear mapping from 800–1400 ELO to 0–100. Percentage-based scores map directly.
Modality Scoring
In addition to overall scores, each organization is assessed across four modalities: Text, Image, Video, and Multimodal. These are computed from the benchmark categories above and contribute to the category-specific leaderboards.
Confidence Levels
Each score is assigned a confidence level based on data availability:
Score Computation
The overall score is a weighted average of the six pillar scores:
The Capability pillar is informed by benchmark data when available. Category scores are computed as weighted averages of normalized benchmark scores within each category, re-normalized to account for missing data points.
Rank movement is computed by comparing each organization's rank in the latest snapshot against the previous snapshot. A positive delta means the org moved up.
Data Sources
Benchmark data is automatically ingested from these primary sources:
- •LM Arena / Chatbot Arena — ELO ratings for text, image, and video models via crowdsourced blind comparisons
- •Artificial Analysis — standardized benchmark scores (MMLU-Pro, GPQA, SimpleQA, MATH-500, HumanEval+, IFEval)
- •HuggingFace Datasets — open datasets for arena scores and evaluation results
Scores are also informed by:
- •Official blog posts, technical reports, and research papers
- •Press releases, funding announcements, and SEC filings
- •App store rankings, user metrics, and adoption data
- •Hugging Face model statistics, GitHub activity, and developer adoption
Update Cadence
Benchmark data is automatically refreshed daily at 6:00 AM UTC via our ingestion pipeline, which pulls the latest scores from LM Arena, Artificial Analysis, and HuggingFace. Editorial scores (velocity, adoption, compute, ecosystem, trust) are updated approximately every 2–4 weeks, or sooner when major events occur (significant model releases, large funding rounds, benchmark-shifting results). Each update is documented in the changelog.
Disclaimer
Scores reflect a combination of automated benchmark data and editorial assessment based on publicly available information. They do not constitute investment advice, endorsement, or definitive rankings. The AI landscape moves quickly and scores may not reflect the very latest developments. We welcome feedback and corrections at corrections@theairace.live.