Nexo Earn with Nexo
MiniMax teases M3 model with 15.6x faster decoding speed boost

MiniMax teases M3 model with 15.6x faster decoding speed boost

The Shanghai-based AI lab claims its new sparse attention architecture dramatically accelerates long-context processing, setting up a showdown with global AI heavyweights.

Processing a million tokens just got a lot faster, at least on paper. MiniMax, the Shanghai-based AI company behind the Hailuo video model series, has teased its upcoming M3 language model featuring a new architecture called MiniMax Sparse Attention (MSA) that claims 15.6x faster decoding and 9.7x faster prefill speeds compared to its M2 predecessor when handling contexts up to 1 million tokens.

What MiniMax is actually building

The new MSA architecture is built on a GQA-based (Grouped Query Attention) sparse attention mechanism. Instead of having the model pay equal attention to every piece of input data, sparse attention lets it focus on the parts that matter most, cutting the computational overhead required for long documents, codebases, or agentic workflows.

Advertisement

MiniMax’s engineering lead, Skyler Miao, showcased the approach on social media, generating significant buzz across X, Reddit, and Threads. The teaser arrived alongside a comprehensive technical report detailing the innovations behind the M2 model family, which includes M2, M2.5, and M2.7.

MiniMax has also signaled that M3 will be open-source, consistent with the company’s strategy of releasing frontier models under permissive, enterprise-friendly licenses.

Why speed at scale matters

MiniMax serves over 200 million users through products like Talkie and Hailuo. That’s a massive user base that could benefit from faster, more efficient models without requiring proportional increases in infrastructure spending.

The company was listed on the Hong Kong Stock Exchange in January 2026, giving it public market visibility and access to capital markets as it scales these capabilities.

The competitive landscape and what investors should watch

That said, there’s a significant caveat. Benchmark performance metrics, model weights, and a release timeline for M3 remain undisclosed. Until independent evaluations confirm that M3 maintains or improves upon M2’s quality while delivering these speed gains, the claims remain exactly that: claims.

Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our Editorial Policy.

MiniMax teases M3 model with 15.6x faster decoding speed boost

MiniMax teases M3 model with 15.6x faster decoding speed boost

The Shanghai-based AI lab claims its new sparse attention architecture dramatically accelerates long-context processing, setting up a showdown with global AI heavyweights.

Processing a million tokens just got a lot faster, at least on paper. MiniMax, the Shanghai-based AI company behind the Hailuo video model series, has teased its upcoming M3 language model featuring a new architecture called MiniMax Sparse Attention (MSA) that claims 15.6x faster decoding and 9.7x faster prefill speeds compared to its M2 predecessor when handling contexts up to 1 million tokens.

What MiniMax is actually building

The new MSA architecture is built on a GQA-based (Grouped Query Attention) sparse attention mechanism. Instead of having the model pay equal attention to every piece of input data, sparse attention lets it focus on the parts that matter most, cutting the computational overhead required for long documents, codebases, or agentic workflows.

Advertisement

MiniMax’s engineering lead, Skyler Miao, showcased the approach on social media, generating significant buzz across X, Reddit, and Threads. The teaser arrived alongside a comprehensive technical report detailing the innovations behind the M2 model family, which includes M2, M2.5, and M2.7.

MiniMax has also signaled that M3 will be open-source, consistent with the company’s strategy of releasing frontier models under permissive, enterprise-friendly licenses.

Why speed at scale matters

MiniMax serves over 200 million users through products like Talkie and Hailuo. That’s a massive user base that could benefit from faster, more efficient models without requiring proportional increases in infrastructure spending.

The company was listed on the Hong Kong Stock Exchange in January 2026, giving it public market visibility and access to capital markets as it scales these capabilities.

The competitive landscape and what investors should watch

That said, there’s a significant caveat. Benchmark performance metrics, model weights, and a release timeline for M3 remain undisclosed. Until independent evaluations confirm that M3 maintains or improves upon M2’s quality while delivering these speed gains, the claims remain exactly that: claims.

Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our Editorial Policy.