Claude Opus 4.6 Thinking Retakes Text #1

The text crown shifted backward by one Elo point: Claude Opus 4.6 Thinking is back at #1, Opus 4.7 Thinking slips to #2, and Meta Muse Spark enters the top five. Noisy, but revealing.

Anthropic·2026-04-22·5 min read

Claude Opus 4.7 Thinking Quietly Takes Code #1

Anthropic just changed the code leaderboard again. Claude Opus 4.7 Thinking has overtaken base Opus 4.7 to become the new #1 coding model on LMArena. Top four is now all Anthropic.

Anthropic·2026-04-18·6 min read

Claude Opus 4.7 Sweeps Both Leaderboards: +35 Elo on Code

Anthropic dropped Claude Opus 4.7 and it now holds #1 on both text and code. The code jump — from 1548 to 1583 Elo — is the biggest single-model gain we've tracked. Anthropic now holds six of the top seven code slots.

Meta·2026-04-11·6 min read

Meta's Muse Spark Debuts at #3: The Superintelligence Lab's First Move

Meta's first model from Alexandr Wang's $14.3B superintelligence lab enters LMArena at rank #3 with 1493 Elo — ahead of every Google and OpenAI model on text. 174 rank positions above Meta's previous best. Here's what it means.

Google·2026-04-09·6 min read

Gemma 4 Makes Its Arena Debut: Google's 31B Model Beats Giants

Google's Gemma 4 enters LMArena at rank #29 with 1451 Elo — a 31B open-weight model beating proprietary rivals many times its size. Apache 2.0 licensed, runnable on a workstation. The open-weight era just got serious.

OpenAI·2026-03-06·5 min read

GPT-5.4 Lands at #7: OpenAI's 1M Token Bet

OpenAI ships GPT-5.4 with a 1M token context window and improved efficiency. It debuts at #7 on LMArena text with 1480 Elo — strong, but 24 points behind leader Claude Opus 4.6. The enterprise play matters more than the ranking.

xAI·2026-02-25·5 min read

Grok 4.20 Beta Crashes the Top 5: xAI's Multi-Agent Bet

xAI's Grok 4.20 Beta debuts at #4 on LMArena with 1492 Elo — ahead of every Google and OpenAI model on text. Trained on 200K GPUs with native multi-agent architecture, xAI is no longer just participating in the race.

OpenAI·2026-02-22·6 min read

GPT-5.2 Climbs to #5: Better, But Not Enough

A new GPT-5.2 checkpoint vaults from #29 to #5 on LMArena's Text leaderboard — a 40-point Elo jump. But the company that held #1 for 651 days is still 27 points behind leader Claude Opus 4.6.

Anthropic·2026-02-21·5 min read

Sonnet Beats Opus: Anthropic's $3 Model Just Killed Its Own Flagship

Claude Sonnet 4.6 scores 1524 on Code — beating Opus 4.5 (1496) and Opus 4.5 Thinking (1510). In just 4 months, the mid-tier model surpassed the previous flagship at one-fifth the price. Today's best is tomorrow's budget tier.

Anthropic·2026-02-21·5 min read

Claude Sonnet 4.6: The Coding Machine

Anthropic's mid-tier model debuts at #3 on LMArena's Code leaderboard with 1524 Elo — beating every OpenAI and Google model. At $3/$15 per million tokens, it's flagship performance at Sonnet prices.

Google·2026-02-19·7 min read

Gemini 3.1 Pro: Google Doubles Its Reasoning — Still Can't Take #1

Google's Gemini 3.1 Pro scores 77.1% on ARC-AGI-2, more than doubling its predecessor. It leads most major benchmarks. But with Claude Opus 4.6 just 5 Elo points ahead, the AI race has never been tighter.

Anthropic·2026-02-07·5 min read

Claude Opus 4.6 Takes the Crown: What Anthropic Got Right

After 23 days of Google's Gemini Pro dominance, Anthropic has reclaimed the top spot. Here's what changed, why it matters, and what it signals about the future of AI development.

LIVE ALERTS

Never Miss a New #1

Get notified when a new model takes the top spot.

No spam. Unsubscribe anytime.