Google has introduced its first bifurcated TPU generation, consisting of the TPU 8t optimized for training and the TPU 8i optimized for inference, at Google Cloud Next 2026. This development, created in partnership with Google DeepMind, aims to meet the distinct demands of AI training and inference, enhancing AI lifecycle acceleration through deployment on Axion ARM-based processors.
TPU 8i: TPU 8i is Google’s eighth-generation TPU specialized for post-training, high-concurrency reasoning, sampling, and serving as an inference engine. It incorporates high on-chip SRAM, a Collectives Acceleration Engine, and the Boardfly network topology for low-latency AI agent workloads. The news highlights TPU 8i as optimized for inference and looking promising.
TPU 8t: TPU 8t is Google’s eighth-generation Tensor Processing Unit, purpose-built as a pre-training powerhouse for massive-scale AI model training and embedding-heavy workloads. It employs an upgraded 3D torus network topology suited for large superpods and integrates with the AI Hypercomputer software stack. In the news, TPU 8t is spotlighted as optimized for training and performing well.
Deployment: Hosted on Axion ARM-based processors for optimized AI lifecycle acceleration.
Development: Designed in partnership with Google DeepMind to power agentic AI applications.
Announcement: Introduced at Google Cloud Next 2026 as the first bifurcated TPU generation addressing divergent AI training and inference demands.
