Anthropic·

Claude Opus 4.7 Thinking Quietly Takes Code #1

Anthropic already had the strongest coding model on LMArena Code. Now it has a stronger version of that model sitting above it. Claude Opus 4.7 Thinking just displaced base Opus 4.7 at #1, and the broader message is hard to miss: Anthropic is still widening the gap where developers actually care.

New Code #1
1576
claude-opus-4-7-thinking
Previous Code #1
1569
claude-opus-4-7 +7↑

This is not the kind of leaderboard move that comes with a keynote. It is subtler than that. Anthropic shipped Opus 4.7, took both leaderboards, then watched the thinking variant climb past the base model on code a few days later.

That matters because code leaderboards are less forgiving than text leaderboards. You can charm your way through a copywriting prompt. You cannot charm your way through a broken build. The code arena is where execution quality shows up fast.

Anthropic now owns the top of the code stack

Code Leaderboard, Apr 22 2026
1claude-opus-4-7-thinkingAnthropic1576
2claude-opus-4-7Anthropic1569
3claude-opus-4-6-thinkingAnthropic1549
4claude-opus-4-6Anthropic1544
5glm-5.1Z.ai1536
6claude-sonnet-4-6Anthropic1525
7claude-opus-4-5-20251101-thinking-32kAnthropic1491
8claude-opus-4-5-20251101Anthropic1467
9gpt-5.4-high (codex-harness)OpenAI1456
10qwen3.6-plus-previewAlibaba1455

Top four is all Anthropic. Six of the top eight is Anthropic. OpenAI does not appear until ninth. Google misses the top ten entirely. That is not healthy benchmark diversity. It is one lab setting the pace while everyone else drafts behind it.

What changed in Opus 4.7

Anthropic's official launch notes point to better advanced software engineering, stronger long-running task execution, higher-resolution vision support, and tighter autonomous behavior on multi-step work. The company also kept pricing flat at $5 per million input tokens and $25 per million output tokens, with the full 1M token context window included.

Opus 4.7 Quick Facts
Input price$5 / 1M
Output price$25 / 1M
Context window1M tokens
Max output128K tokens
New launch featuresxhigh effort, hi-res vision

The interesting part is not that Anthropic launched another strong model. The interesting part is that the thinking version immediately found extra headroom on code. That suggests the gains are not just in polished output style. They show up where planning, tool use, and error correction compound.

Why this matters more than a seven-point swing sounds like it should

Seven Elo points is not a regime change by itself. But when the model gaining those seven points is already the incumbent leader, the implication is different. Anthropic did not just defend the lead. It found a second gear above the leader it had just shipped.

That is exactly the pattern developers should watch for. Frontier progress on coding is no longer only about one-off launch spikes. Labs are learning how to stack capability layers: base model, thinking mode, longer-horizon agents, stronger tool orchestration. If you are buying AI for software work, those layers matter more than headline chatbot demos.

The market read

OpenAI still has distribution. Google still has infrastructure. But on coding quality, Anthropic keeps making the strongest empirical case. Not because of vibes, not because of Twitter sentiment, and not because of benchmark cherry-picking. Because the arena keeps voting that way.

Right now the cleanest summary is this: Claude Opus 4.7 won code, then Claude Opus 4.7 Thinking beat Claude Opus 4.7.