NVIDIA has announced the first benchmark results for agentic AI, demonstrating that its GB300 NVL72 configuration can run up to 20 times more coding agents per megawatt than the H200 model, achieving 61.4K concurrent agents at the lowest service tier compared to H200’s 2.6K. This new benchmark, developed by Artificial Analysis, evaluates systems not only on speed but also on their ability to handle multiple ongoing agent interactions, which involve complex, multi-step processes rather than simple token generation. The shift in benchmarks reflects the increasing infrastructure demands of agentic AI workloads, which require efficient coordination of memory, networking, and processing within large GPU clusters to ensure responsiveness during extended tasks.
NVIDIA: NVIDIA designs and manufactures graphics processing units and AI accelerators that power data centers and computing systems worldwide. The company is using its GB300 NVL72 rack-scale platform to showcase performance advantages in demanding agentic AI scenarios versus prior hardware generations. Relevance stems from its focus on integrated hardware-software optimizations that support long-running, multi-step AI agent workflows.
Artificial Analysis: Artificial Analysis develops and publishes independent benchmarks for assessing AI models, systems, and infrastructure under realistic conditions. It created the AgentPerf benchmark featured in the results, which measures concurrent agent capacity across complex, multi-turn coding tasks rather than simple single-prompt inference. The organization provides standardized testing frameworks that help evaluate infrastructure suitability for production agentic workloads.
Workload Realism: New benchmarks replay actual coding agent paths from public repositories across multiple languages to better reflect production use cases with variable request lengths.
Benchmark Evolution: AI evaluation is shifting from single-prompt token generation metrics toward multi-step agent simulations that incorporate tool interactions and extended context handling.
Infrastructure Demands: Agentic AI workloads require coordinated memory, networking, and batching across large GPU clusters to maintain responsiveness over long task chains.
