Meta's Muse Spark Enters the Top 3
The $14.3 billion bet on Alexandr Wang just shipped its first model. It debuted at rank #3 on LMArena — ahead of every Google and OpenAI model on the text leaderboard. For a company that couldn't break the top 5 a year ago, that's a statement.
Meta's Muse Spark launched on April 8th. No leaked benchmarks, no pre-release hype cycle. Just a blog post, a Meta AI app update, and then the arena battles started.
Seventy-two hours later, the Elo has stabilized at 1493. That puts Muse Spark one point above Gemini 3.1 Pro Preview, three points above Gemini 3 Pro, and seven points above OpenAI's GPT-5.4 High. Only Claude Opus 4.6 Thinking (1504) and Claude Opus 4.6 (1496) sit higher.
For context: Meta's best previous model on this leaderboard was Llama 4 Maverick, which sits at rank #177 with 1327 Elo. That's a 166-point jump and 174 rank positions in a single generation.
What Alexandr Wang actually built
Muse Spark is the inaugural model from Meta Superintelligence Labs, the team Wang was recruited to lead last year under a $14.3B Scale AI deal that gave Meta a 49% stake in the data labeling giant. Wang's official title is head of superintelligence at Meta. His unofficial mandate: fix the AI team after Llama 4 underdelivered.
The model is internally codenamed “Avocado.” Meta hasn't disclosed its parameter count. They also walked back their tradition of open-weight releases — Muse Spark launched as a “private preview” with unnamed partners, with no public weights available.
What Meta did disclose: it's “small and fast by design.” The pitch is a capable reasoning model that can run efficiently at scale across Meta's properties — WhatsApp, Instagram, Facebook, and their smart glasses lineup. Over the next few weeks, it's expected to replace the Llama-based models powering those chatbots.
The multi-agent play: Contemplating Mode
The most technically interesting part of the Muse Spark announcement isn't the base model. It's what's coming next.
Meta is building “Contemplating Mode” — a reasoning layer that runs multiple Muse Spark agents simultaneously on the same problem. Rather than scaling a single chain-of-thought to match competitors' extended thinking modes, Meta is scaling horizontally. More agents working in parallel, coordinating on hard problems.
From Meta's blog post: “To spend more test-time reasoning without drastically increasing latency, we can scale the number of parallel agents that collaborate to solve hard problems.”
This is an architecture bet. OpenAI's o3 and Anthropic's thinking models scale depth — one model, more compute, longer thought chains. Meta is betting on breadth — multiple models, parallel reasoning, merged conclusions.
The parallel agent approach maps to Meta's infrastructure strengths. They have more GPUs than most. Running 10 instances of a fast, efficient model might be more practical for Meta than running one very large, slow reasoning model. If Contemplating Mode works as advertised, it could push Muse Spark further up the rankings without requiring a fundamentally bigger model.
Where Muse Spark is strong — and where it isn't
Reuters called out something important in their early testing: Muse Spark catches up with top models in language understanding and visual STEM, but “lags in coding and abstract reasoning.”
That tracks with what we'd expect from a model optimized for consumer conversational use cases over developer and technical workloads. The LMArena text leaderboard measures broad conversational quality — which is probably where Meta put most of its training energy. Code Arena tells a different story (Muse Spark doesn't appear on that leaderboard yet).
Meta's open-source U-turn
This one deserves its own paragraph.
Meta built its AI credibility on open-weight releases. Llama, Llama 2, Llama 3, Llama 4 — all released publicly. The open-source AI community rallied behind Meta precisely because they kept shipping weights developers could actually use.
Muse Spark is closed. Private preview only. Zuckerberg says open-source models are “coming” but they're not here yet. Whether that promise holds — especially if Muse Spark gives Meta a competitive advantage — remains to be seen.
The cynic's read: the moment Meta has something frontier-quality, they're keeping it. The charitable read: this is the first release from a new team that needed polish before going public, and open weights are genuinely coming in a later iteration.
Either way, it's a shift worth watching. 3.5 billion daily active users on Meta's platforms — WhatsApp alone has two billion — creates deployment scale that makes “proprietary” mean something different than it does for OpenAI or Anthropic.
The broader picture: five labs at the frontier
Before Muse Spark, the top 5 on LMArena text was a three-lab affair: Anthropic, Google, and xAI. OpenAI's best models were hovering around rank 7–9. Meta wasn't in the conversation at all.
Now there are five labs within striking distance of #1: Anthropic, Google, Meta, xAI, and OpenAI — in that order, as of this writing.
That's genuinely new. A year ago, the AI race looked like a two-horse contest between OpenAI and Google, with Anthropic as the interesting challenger. Now the frontier has five serious contenders, each with different architectural bets and different distribution moats.
Muse Spark hasn't held #1. It's 11 points behind Claude Opus 4.6 Thinking and Claude could retrain on a longer schedule while Meta ships. But the gap is small enough that a strong update — especially once Contemplating Mode lands — could change the chart.
The bigger point: Meta is no longer a data point at the bottom of the rankings. They're a serious competitor at the top.
Zuckerberg said in January: “I expect us to steadily push the frontier over the course of the year as we continue to release new models.” Muse Spark is first proof that this wasn't just investor-call posturing.
Data: LMArena as of April 11, 2026 · 339 models tracked · 5.78M text votes