Claude Opus 4.7 Thinking Quietly Takes Code #1
Anthropic already had the strongest coding model on LMArena Code. Now it has a stronger version of that model sitting above it. Claude Opus 4.7 Thinking just displaced base Opus 4.7 at #1, and the broader message is hard to miss: Anthropic is still widening the gap where developers actually care.
This is not the kind of leaderboard move that comes with a keynote. It is subtler than that. Anthropic shipped Opus 4.7, took both leaderboards, then watched the thinking variant climb past the base model on code a few days later.
That matters because code leaderboards are less forgiving than text leaderboards. You can charm your way through a copywriting prompt. You cannot charm your way through a broken build. The code arena is where execution quality shows up fast.
Anthropic now owns the top of the code stack
Top four is all Anthropic. Six of the top eight is Anthropic. OpenAI does not appear until ninth. Google misses the top ten entirely. That is not healthy benchmark diversity. It is one lab setting the pace while everyone else drafts behind it.
What changed in Opus 4.7
Anthropic's official launch notes point to better advanced software engineering, stronger long-running task execution, higher-resolution vision support, and tighter autonomous behavior on multi-step work. The company also kept pricing flat at $5 per million input tokens and $25 per million output tokens, with the full 1M token context window included.
The interesting part is not that Anthropic launched another strong model. The interesting part is that the thinking version immediately found extra headroom on code. That suggests the gains are not just in polished output style. They show up where planning, tool use, and error correction compound.
Why this matters more than a seven-point swing sounds like it should
Seven Elo points is not a regime change by itself. But when the model gaining those seven points is already the incumbent leader, the implication is different. Anthropic did not just defend the lead. It found a second gear above the leader it had just shipped.
That is exactly the pattern developers should watch for. Frontier progress on coding is no longer only about one-off launch spikes. Labs are learning how to stack capability layers: base model, thinking mode, longer-horizon agents, stronger tool orchestration. If you are buying AI for software work, those layers matter more than headline chatbot demos.
The market read
OpenAI still has distribution. Google still has infrastructure. But on coding quality, Anthropic keeps making the strongest empirical case. Not because of vibes, not because of Twitter sentiment, and not because of benchmark cherry-picking. Because the arena keeps voting that way.
Right now the cleanest summary is this: Claude Opus 4.7 won code, then Claude Opus 4.7 Thinking beat Claude Opus 4.7.