MiniMax teases M3 model with 15.6x faster decoding speed boost

Processing a million tokens just got a lot faster, at least on paper. MiniMax, the Shanghai-based AI company behind the Hailuo video model series, has teased its upcoming M3 language model featuring a new architecture called MiniMax Sparse Attention (MSA) that claims 15.6x faster decoding and 9.7x faster prefill speeds compared to its M2 predecessor when handling contexts up to 1 million tokens.

What MiniMax is actually building

The new MSA architecture is built on a GQA-based (Grouped Query Attention) sparse attention mechanism. Instead of having the model pay equal attention to every piece of input data, sparse attention lets it focus on the parts that matter most, cutting the computational overhead required for long documents, codebases, or agentic workflows.

MiniMax’s engineering lead, Skyler Miao, showcased the approach on social media, generating significant buzz across X, Reddit, and Threads. The teaser arrived alongside a comprehensive technical report detailing the innovations behind the M2 model family, which includes M2, M2.5, and M2.7.

MiniMax has also signaled that M3 will be open-source, consistent with the company’s strategy of releasing frontier models under permissive, enterprise-friendly licenses.

Why speed at scale matters

MiniMax serves over 200 million users through products like Talkie and Hailuo. That’s a massive user base that could benefit from faster, more efficient models without requiring proportional increases in infrastructure spending.

The company was listed on the Hong Kong Stock Exchange in January 2026, giving it public market visibility and access to capital markets as it scales these capabilities.

The competitive landscape and what investors should watch

That said, there’s a significant caveat. Benchmark performance metrics, model weights, and a release timeline for M3 remain undisclosed. Until independent evaluations confirm that M3 maintains or improves upon M2’s quality while delivering these speed gains, the claims remain exactly that: claims.

What MiniMax is actually building

MiniMax has also signaled that M3 will be open-source, consistent with the company’s strategy of releasing frontier models under permissive, enterprise-friendly licenses.

Why speed at scale matters

The company was listed on the Hong Kong Stock Exchange in January 2026, giving it public market visibility and access to capital markets as it scales these capabilities.

The competitive landscape and what investors should watch

Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our Editorial Policy.

MiniMax teases M3 model with 15.6x faster decoding speed boost

What MiniMax is actually building

Why speed at scale matters

The competitive landscape and what investors should watch

MiniMax teases M3 model with 15.6x faster decoding speed boost

What MiniMax is actually building

Why speed at scale matters

The competitive landscape and what investors should watch

Get Crypto Briefing in your inbox