Anthropic·April 18, 2026

Claude Opus 4.7 Sweeps Both Leaderboards

Anthropic just dropped a new model family. Claude Opus 4.7 is now #1 on text and code — including a 35-point Elo jump on code that is the biggest single leap we've tracked on this leaderboard.

Text #1

1505

claude-opus-4-7-thinking

Code #1

1583

claude-opus-4-7 +35↑

Claude Opus 4.7 didn't arrive with a blog post tour. No countdown, no embargo. As with Anthropic's recent pattern: the model appeared in the arena, battles accumulated, and the numbers spoke. As of the April 18th snapshot, it holds both leaderboard tops.

The code result is the striking one. Claude Opus 4.6-thinking previously held the code throne at 1548 Elo. Opus 4.7 sits at 1583 — a 35-point gap over its immediate predecessor, after thousands of blind evaluations. That's not model drift. That's a signal.

The Anthropic trajectory

Let's look at where Anthropic's flagship models have landed on the code leaderboard over the past year:

Code Elo Progression — Anthropic Flagships

Claude Opus 4 (May 2025)~1390

Claude Opus 4.5 Thinking (Nov 2025)1491

Claude Opus 4.6 Thinking (Feb 2026)1549

Claude Opus 4.7 (Apr 2026)1583

Roughly 30 Elo points per release, each release about three months apart. The pace hasn't slowed. If anything, the jumps are getting larger at higher baselines — which is the opposite of what "diminishing returns" would predict.

On text, the picture is slightly more compressed at the top. Claude Opus 4.7 Thinking sits at 1505 Elo, three points above the previous leader (Opus 4.6 Thinking at 1503). That's within the variance. But the non-thinking variant of Opus 4.7 lands at 1498, immediately behind the 4.6 Thinking model — which means Anthropic now holds four of the top five text slots.

Why the code gap is larger

The 35-point code jump versus a 3-point text jump tells you something about where Anthropic invested its chips in this release.

Code evaluation in arenas is harder to game with stylistic improvements. Users either run the output or they don't. The WebDev benchmark that drives the code leaderboard involves multi-step agentic tasks with real execution feedback. A model can't wriggle past a failing test with better prose.

Opus 4.7's 1583 puts it 34 points above the second-place Opus 4.6 Thinking (1549) and 58 points above Anthropic's own Sonnet model (1525). In a race where the spread between positions 2 through 30 is only about 200 Elo points, a 34-point lead at the top is notable.

Code Leaderboard — Top 10 (Apr 18, 2026)

1claude-opus-4-7Anthropic1583

2claude-opus-4-6-thinkingAnthropic1549

3claude-opus-4-6Anthropic1546

4glm-5.1Z.ai1538

5claude-sonnet-4-6Anthropic1525

6claude-opus-4-5-thinkingAnthropic1491

7claude-opus-4-5Anthropic1468

8gpt-5.4-high (codex)OpenAI1457

9qwen3.6-plus-previewAlibaba1456

10gemini-3.1-pro-previewGoogle1455

Anthropic's stranglehold on code

Look at that top 10 again. Anthropic holds six of the top seven code slots with current and recent-generation models. GLM-5.1 from Z.ai at #4 is the only interruption. OpenAI's best code entry, GPT-5.4-high on the codex harness, sits 126 points behind Opus 4.7.

That gap matters commercially. Claude Code launched with Opus 4.6, and adoption has been aggressive in the developer community. With Opus 4.7 extending the lead further, Anthropic is consolidating its position as the default choice for AI-assisted development workflows.

The pricing picture adds to this: Opus 4.7 keeps the same $5/$25 per million token structure as its predecessor, with a 1M token context window. Better model, same price. That's the move.

Text: a different story

The text leaderboard is more contested. Here's the current top five:

Text Leaderboard — Top 5 (Apr 18, 2026)

1claude-opus-4-7-thinkingAnthropic1505

2claude-opus-4-6-thinkingAnthropic1503

3claude-opus-4-7Anthropic1498

4claude-opus-4-6Anthropic1497

5muse-sparkMeta1496

The top four are all Anthropic. Meta's Muse Spark — the model from Alexandr Wang's $14.3B superintelligence lab — holds fifth at 1496, just nine points behind the leader. Google's best text entry, Gemini 3.1 Pro Preview, is at 1492 in sixth. The top six spans only 13 Elo points.

In text, Anthropic technically leads, but the race is real. In code, it's a runaway.

The velocity question

We've tracked six major leader changes on this leaderboard since we started logging snapshots. This is the sixth time Anthropic has held #1, but it's the first time they've dominated both text and code simultaneously.

Leaderboard History (Text #1)

Pre-Jan 2026OpenAI651+ days

Jan 14, 2026Google23 days (Gemini 3 Pro)

Feb 7, 2026Anthropicongoing (Claude Opus 4.6)

Apr 18, 2026Anthropicnew (Claude Opus 4.7)

The bigger pattern: quarterly releases are now the norm. Claude Opus 4.6 dropped in February. Opus 4.7 dropped in April. Anthropic is shipping major Opus revisions roughly every 10 weeks. At this cadence, they can iterate on weaknesses faster than competitors can counter — especially on code, where the feedback loops from real-world use cases are tighter.

What this means for everyone else

For Google: Gemini 3.1 Pro Preview sits at 1492 on text — competitive. But on code it's at 1455, 128 points behind Opus 4.7. If Google wants to compete in the developer market, this is the gap they need to close.

For OpenAI: GPT-5.4 High lands at 1482 on text (#9) and 1457 on code (#8). The codex-harness variant tests execution capabilities directly. OpenAI's best code result is still 126 points off the pace. Their ongoing bets on o3, o4-mini, and the codex infrastructure are steps in the right direction, but the raw model quality gap is widening.

For Meta: Muse Spark at #5 on text is a legitimate achievement. But Meta has no competitive code entry in the top 30. The superintelligence lab is optimizing for conversation first. Whether that's a strategic choice or a gap to fill depends on how you read their roadmap.

For xAI: Grok 4.20 Beta holds multiple positions in the 1470-1485 range on text. But the code leaderboard tells a different story: Grok 4.20 Beta 0309 Reasoning sits at #22 (1402). xAI's multi-agent architecture hasn't translated to code task performance.

The bottom line

Opus 4.7 is the most capable model we've tracked on this leaderboard, by a wide margin on code and by a slim margin on text. The code jump is the real story — 35 Elo points over the previous leader suggests genuine capability advances, not benchmark gaming. Anthropic now holds six of the top seven code slots.

What's likely to change this picture? Google has Gemini 4 architectures in the pipeline. OpenAI is pushing on both reasoning models and the codex infrastructure. But based on current trajectories, Anthropic goes into Q2 2026 with a clear lead in both dimensions of the AI evaluation race.

The next few weeks will tell us whether this is a sustained breakout or a temporary gap while the competition responds. Arena battles are accumulating. We'll be watching the scores.

LIVE ALERTS

Never Miss a New #1

Get notified when a new model takes the top spot.

No spam. Unsubscribe anytime.

← Back to all posts