Apple unveils AFM 3 Core Advanced with 20 billion parameters for on-device AI at WWDC26
Apple's new sparse architecture stores a massive AI model in flash memory and only loads what it needs, a clever workaround to the RAM problem that has bottlenecked on-device AI for years.
Apple just dropped what might be the most architecturally interesting AI announcement of the year, and it didn’t come from a startup or a cloud computing giant. At WWDC 2026, the company revealed its third-generation Apple Foundation Models, headlined by the AFM 3 Core Advanced: a 20-billion-parameter AI model designed to run entirely on your phone.
That number alone would be impressive. But the real trick is how Apple made it work on hardware that doesn’t have anywhere near enough RAM to hold a model that size.
The flash memory workaround
Here’s the core innovation. The full 20B-parameter model lives on the device’s NAND flash storage, the same type of memory that stores your photos, apps, and that podcast backlog you’ll never finish. When you actually ask the model to do something, it doesn’t load the entire 20 billion parameters into working memory. Instead, it selectively activates between 1 and 4 billion parameters per prompt, pulling only the relevant “experts” into DRAM.
Apple calls the technique behind this Instruction-Following Pruning, or IFP. The system dynamically identifies which task-specific parameter subsets are needed for a given request and loads just those into RAM.
To put the jump in perspective, Apple’s first on-device AI model, introduced alongside Apple Intelligence in 2024, ran on roughly 3 billion parameters. AFM 3 Core Advanced represents nearly a 7x increase in total model size, while the active parameter count per query stays in a similar ballpark to what the hardware could already handle.
What it actually does
The practical capabilities unlocked by AFM 3 Core Advanced span several categories. Expressive text-to-speech is one headline feature, suggesting Siri’s voice is about to sound considerably less robotic. Improved dictation accuracy and enhanced image understanding round out the on-device toolkit.
There’s a catch, though. The AFM 3 Core Advanced won’t run on just any Apple device. It requires the A19 Pro chip found in the iPhone 17 Pro, or Macs and iPads equipped with M3 or M4 silicon. Devices with only 8GB of RAM are excluded entirely.
The broader AFM 3 family announced at WWDC 2026 also includes a 3B Core model, presumably for less demanding tasks on a wider range of hardware, as well as cloud-based models for heavier workloads. Apple’s Private Cloud Compute infrastructure handles those server-side requests, maintaining the company’s privacy-first positioning even when on-device processing isn’t sufficient.
Why the architecture matters beyond Apple
Apple’s approach here is worth watching for reasons that extend beyond the Apple ecosystem. The entire AI industry has been grappling with a fundamental tension: models keep getting bigger, but the devices people actually use don’t get proportionally more powerful. Cloud inference solves the compute problem but creates latency, cost, and privacy problems.
For Apple specifically, this reinforces a competitive moat that Google and Samsung have struggled to match: vertical integration. Apple designs its own chips, controls its own operating system, and now architects its own foundation models to exploit the specific memory hierarchy of its hardware. The flash-to-DRAM pipeline that makes AFM 3 Core Advanced possible is optimized for Apple silicon in ways that would be difficult to replicate on more heterogeneous Android hardware.
By limiting AFM 3 Core Advanced to the iPhone 17 Pro and high-end Mac and iPad models, Apple is effectively creating a two-tier AI experience within its own product lineup. That’s a powerful upgrade incentive, particularly if the capability gap between the 3B Core model and the 20B Core Advanced turns out to be as large as the parameter counts suggest.
Earn with Nexo