A new April 2026 study reveals that long-running GPU jobs experience substantial execution-idle time, consuming 19% of their time and 10.67% of their energy without performing useful work. This inefficiency arises amidst a broader trend where training compute for frontier AI models has increased dramatically, projected to grow another 1,000-fold by 2028, following a one trillion times increase since 2010. However, high-bandwidth memory shortages are hindering the scaling of AI infrastructure, leading to concerns that as computational power continues to surge, the limits of memory and data movement will become the principal bottleneck.
Mustafa Suleyman: Mustafa Suleyman serves as CEO of Microsoft AI, directing advancements in AI models and applications such as image generation and health insights. He previously co-founded Inflection AI and Google DeepMind, and authored The Coming Wave on AI’s societal impacts. In this news item, he underscores the rapid escalation in training compute for frontier AI models and anticipates continued exponential progress.
`json
{
“Execution-Idle”: “GPU clusters experience significant execution-idle time in long-running jobs, resulting in energy waste with elevated clock rates but limited productive work.”,
“Scaling Trends”: “Training compute for frontier AI models continues to grow exponentially with advancements in hardware capabilities.”,
“Memory Bottleneck”: “Memory constraints are impacting AI infrastructure scaling as compute performance increases faster than memory size and bandwidth.”
}
`
