Claude Opus 4.7 Sweeps Both Leaderboards
Anthropic just dropped a new model family. Claude Opus 4.7 is now #1 on text and code — including a 35-point Elo jump on code that is the biggest single leap we've tracked on this leaderboard.
Claude Opus 4.7 didn't arrive with a blog post tour. No countdown, no embargo. As with Anthropic's recent pattern: the model appeared in the arena, battles accumulated, and the numbers spoke. As of the April 18th snapshot, it holds both leaderboard tops.
The code result is the striking one. Claude Opus 4.6-thinking previously held the code throne at 1548 Elo. Opus 4.7 sits at 1583 — a 35-point gap over its immediate predecessor, after thousands of blind evaluations. That's not model drift. That's a signal.
The Anthropic trajectory
Let's look at where Anthropic's flagship models have landed on the code leaderboard over the past year:
Roughly 30 Elo points per release, each release about three months apart. The pace hasn't slowed. If anything, the jumps are getting larger at higher baselines — which is the opposite of what "diminishing returns" would predict.
On text, the picture is slightly more compressed at the top. Claude Opus 4.7 Thinking sits at 1505 Elo, three points above the previous leader (Opus 4.6 Thinking at 1503). That's within the variance. But the non-thinking variant of Opus 4.7 lands at 1498, immediately behind the 4.6 Thinking model — which means Anthropic now holds four of the top five text slots.
Why the code gap is larger
The 35-point code jump versus a 3-point text jump tells you something about where Anthropic invested its chips in this release.
Code evaluation in arenas is harder to game with stylistic improvements. Users either run the output or they don't. The WebDev benchmark that drives the code leaderboard involves multi-step agentic tasks with real execution feedback. A model can't wriggle past a failing test with better prose.
Opus 4.7's 1583 puts it 34 points above the second-place Opus 4.6 Thinking (1549) and 58 points above Anthropic's own Sonnet model (1525). In a race where the spread between positions 2 through 30 is only about 200 Elo points, a 34-point lead at the top is notable.
Anthropic's stranglehold on code
Look at that top 10 again. Anthropic holds six of the top seven code slots with current and recent-generation models. GLM-5.1 from Z.ai at #4 is the only interruption. OpenAI's best code entry, GPT-5.4-high on the codex harness, sits 126 points behind Opus 4.7.
That gap matters commercially. Claude Code launched with Opus 4.6, and adoption has been aggressive in the developer community. With Opus 4.7 extending the lead further, Anthropic is consolidating its position as the default choice for AI-assisted development workflows.
The pricing picture adds to this: Opus 4.7 keeps the same $5/$25 per million token structure as its predecessor, with a 1M token context window. Better model, same price. That's the move.
Text: a different story
The text leaderboard is more contested. Here's the current top five:
The top four are all Anthropic. Meta's Muse Spark — the model from Alexandr Wang's $14.3B superintelligence lab — holds fifth at 1496, just nine points behind the leader. Google's best text entry, Gemini 3.1 Pro Preview, is at 1492 in sixth. The top six spans only 13 Elo points.
In text, Anthropic technically leads, but the race is real. In code, it's a runaway.
The velocity question
We've tracked six major leader changes on this leaderboard since we started logging snapshots. This is the sixth time Anthropic has held #1, but it's the first time they've dominated both text and code simultaneously.
The bigger pattern: quarterly releases are now the norm. Claude Opus 4.6 dropped in February. Opus 4.7 dropped in April. Anthropic is shipping major Opus revisions roughly every 10 weeks. At this cadence, they can iterate on weaknesses faster than competitors can counter — especially on code, where the feedback loops from real-world use cases are tighter.
What this means for everyone else
For Google: Gemini 3.1 Pro Preview sits at 1492 on text — competitive. But on code it's at 1455, 128 points behind Opus 4.7. If Google wants to compete in the developer market, this is the gap they need to close.
For OpenAI: GPT-5.4 High lands at 1482 on text (#9) and 1457 on code (#8). The codex-harness variant tests execution capabilities directly. OpenAI's best code result is still 126 points off the pace. Their ongoing bets on o3, o4-mini, and the codex infrastructure are steps in the right direction, but the raw model quality gap is widening.
For Meta: Muse Spark at #5 on text is a legitimate achievement. But Meta has no competitive code entry in the top 30. The superintelligence lab is optimizing for conversation first. Whether that's a strategic choice or a gap to fill depends on how you read their roadmap.
For xAI: Grok 4.20 Beta holds multiple positions in the 1470-1485 range on text. But the code leaderboard tells a different story: Grok 4.20 Beta 0309 Reasoning sits at #22 (1402). xAI's multi-agent architecture hasn't translated to code task performance.
The bottom line
Opus 4.7 is the most capable model we've tracked on this leaderboard, by a wide margin on code and by a slim margin on text. The code jump is the real story — 35 Elo points over the previous leader suggests genuine capability advances, not benchmark gaming. Anthropic now holds six of the top seven code slots.
What's likely to change this picture? Google has Gemini 4 architectures in the pipeline. OpenAI is pushing on both reasoning models and the codex infrastructure. But based on current trajectories, Anthropic goes into Q2 2026 with a clear lead in both dimensions of the AI evaluation race.
The next few weeks will tell us whether this is a sustained breakout or a temporary gap while the competition responds. Arena battles are accumulating. We'll be watching the scores.
Never Miss a New #1
Get notified when a new model takes the top spot.
No spam. Unsubscribe anytime.