Google Cloud models process 16B tokens per minute, up from 10B

Google Cloud has announced that its models are now capable of processing over 16 billion tokens per minute, an increase from 10 billion tokens just last quarter, according to CEO Sundar Pichai. This rapid advancement aligns with an accelerating trend toward agentic enterprise transformation, underscored by an increased direct API usage for first-party models. Additionally, Google Cloud recently launched eighth-generation TPUs, designed specifically for demanding workloads, which supports this growth in processing capabilities.

Google Cloud: Google Cloud is Alphabet’s cloud computing division providing infrastructure, platform services, and advanced AI capabilities including custom TPUs and Gemini models. At Cloud Next ’26, it showcased momentum in AI model inference supporting enterprise agentic workflows through direct customer API access. The event featured launches of the Gemini Enterprise Agent Platform and eighth-generation TPUs optimized for scaling AI agents.
Sundar Pichai: Sundar Pichai serves as CEO of Alphabet and Google, overseeing advancements in AI, search, cloud computing, and hardware. He announced Google Cloud’s accelerated AI processing capabilities during the Cloud Next ’26 keynote. Pichai emphasized new innovations like the Gemini Enterprise Agent Platform and next-gen TPUs driving enterprise adoption.

Adoption Shift: Highlighted accelerating shift toward agentic enterprise transformation with growing direct API usage for first-party models.
AI Infrastructure: Google Cloud launched eighth-generation TPUs designed for demanding agentic workloads at Cloud Next ’26.
Enterprise Platform: Introduced Gemini Enterprise Agent Platform as a unified stack for building, scaling, governing, and optimizing AI agents.