Sonnet Beats Opus: Anthropic's $3 Model Just Killed Its Own Flagship
Four months ago, Claude Opus 4.5 was the best coding model Anthropic had ever built. Today, their mid-tier Sonnet 4.6 — at one-fifth the price — beats it by 28 Elo points on LMArena's Code leaderboard. This is what progress looks like when it's measured in months, not years.
Read that again. A Sonnet-class model — the mid-tier, the one Anthropic prices at $3/$15 per million tokens — is now scoring higher on code than the Opus 4.5 that cost $15/$75. It doesn't just edge it out. It clears both Opus 4.5 (1496) and Opus 4.5 Thinking (1510) by comfortable margins.
In October 2025, Opus 4.5 was the crown jewel. The most capable model Anthropic offered. The one you reached for when Sonnet wasn't good enough. Four months later, the new Sonnet makes it obsolete — at least for code.
This is happening faster than anyone expected
The usual cadence in AI goes like this: a company releases a flagship model, then months later distills some of that capability into a smaller, cheaper model. The distilled version is good — maybe 85-90% of the flagship — but never quite as good.
Sonnet 4.6 breaks that pattern. It's not a distillation of Opus 4.5. It's a generation ahead. Built on whatever Anthropic learned shipping Opus 4.6, but packaged in a model that's 5x cheaper and meaningfully faster. The result is a mid-tier model that doesn't just approach the old flagship — it surpasses it.
This is the deflation curve that matters. Not benchmarks improving 2% per quarter. The price at which you can access state-of-the-art code generation dropped 80% in four months. If you were paying for Opus 4.5 API calls in December, you can now getbetter code output for a fraction of the cost.
Code beast, text… not so much
Here's the asterisk. Sonnet 4.6 sits at #13 on the Text leaderboard — a slight improvement over Sonnet 4.5, but nothing dramatic. Meanwhile, Claude Opus 4.6 Thinking holds the #1 spot on Text. That's a massive gap between Anthropic's mid-tier and flagship on general conversation.
The contrast is striking. On Code, Sonnet 4.6 is within spitting distance of Opus 4.6 — just 6 Elo points separate them. On Text, it's a different universe. The flagship justifies its 5x price tag for writing, analysis, and reasoning tasks. It just doesn't justify it for code anymore.
This tells you something about where Anthropic focused their optimization. Sonnet 4.6 wasn't built to be a general-purpose flagship replacement. It was built to be a coding machine — and on that axis, it overdelivered so hard it leapfrogged the previous Opus.
What this means for model selection
If you're still using Opus 4.5 for coding tasks, stop. There is no longer any reason to pay 5x more for an objectively worse code model. Sonnet 4.6 is faster, cheaper, and scores higher. The only models that beat it are the current-generation Opus 4.6 variants.
The decision tree is now simple:
Need absolute best code output? Opus 4.6 (1530 Elo, $15/$75)
Need great code at reasonable cost? Sonnet 4.6 (1524 Elo, $3/$15) — just 6 points behind
Still on Opus 4.5? You're paying more for less. Upgrade or downgrade — either way, move.
The bigger picture: today's flagship is tomorrow's mid-tier
This is the pattern to internalize. Not that Sonnet 4.6 is good — that's obvious. The pattern is that Anthropic's best model from October is now inferior to their budget model in February. Four months. That's the half-life of a flagship.
It's happening across the industry. GPT-4 was the undisputed king in 2023. By mid-2024, GPT-4o matched it at a fraction of the cost. Gemini 1.5 Pro was Google's best in early 2025; by Q4, Flash models were in the same range. Every generation, the frontier capability gets absorbed into the efficient tier faster.
For anyone building on AI APIs, the implication is clear: don't lock in to a model tier. The Sonnet-class models today will be better than the Opus-class models you benchmarked last quarter. Build for model flexibility, re-evaluate monthly, and let the leaderboard tell you when it's time to switch.
Anthropic just proved that their own cheapest model can beat their own most expensive one — given four months of progress. That's not embarrassing. That's the point.
Track the AI Model Race
Get notified when leaderboard rankings shift.
No spam. Unsubscribe anytime.