Apple unveils AFM 3 Core Advanced with 20B parameters for on-device AI at WWDC26

Apple has introduced its new AFM 3 Core Advanced model, which can store a significant 20 billion parameters in flash memory, activated as needed for each prompt, announced during WWDC26. This advancement circumvents the traditional memory limitations faced by on-device AI models, which typically require the entire weight set to fit in DRAM. Developed in collaboration with Google, the AFM 3 models provide enterprises in regulated industries with a viable option for deploying local agentic AI without relying on cloud services, thus changing the architectural landscape as businesses now focus on device hardware constraints for these advanced AI implementations.

Apple: Apple is a technology company focused on consumer electronics, software platforms, and integrated AI systems. It announced its third-generation foundation models at WWDC26, introducing a new on-device architecture that stores the full weight set in flash memory to overcome DRAM limitations for larger models.
Google: Google is a technology company that provides cloud infrastructure, AI tools, and enterprise services. It collaborated with Apple on the AFM 3 family of models, with the server-side components running on Nvidia GPUs within Google Cloud under Apple’s Private Cloud Compute framework.
AFM 3 Cloud Pro: AFM 3 Cloud Pro is the server-based model in Apple’s AFM 3 family designed for agentic tool use and complex reasoning tasks. It operates within Apple’s Private Cloud Compute boundary on infrastructure hosted in Google Cloud.
AFM 3 Core Advanced: AFM 3 Core Advanced is Apple’s on-device foundation model that employs Instruction-Following Pruning and prompt-level expert routing. It keeps the full parameter set in NAND flash and loads only the required experts into DRAM for each prompt to enable efficient on-device inference.

`json
{
“Architecture”: “Apple’s approach routes experts once per prompt rather than per token, allowing the full model to reside in flash memory while using DRAM only as a working buffer.”,
“Collaboration”: “The AFM 3 models were developed in partnership between Apple and Google, with server inference handled through Google Cloud.”,
“Enterprise Impact”: “Regulated industries now have a new architectural option for deploying capable agentic AI agents locally without mandatory cloud round-trips, shifting focus to device hardware constraints.”
}
`