Today, Microsoft announced a significant advancement in AI-driven cybersecurity with the launch of its new multi-model agentic security system, known as codename MDASH, which has successfully identified 16 vulnerabilities in the Windows networking and authentication stack ahead of the April Patch Tuesday. This system, developed by Microsoft’s Autonomous Code Security team, employs over 100 specialized AI agents that work collaboratively to discover and validate exploitable bugs, achieving an impressive 88.45% score on the CyberGym benchmark, the highest recorded score to date. The findings include critical remote code execution flaws in components like the Windows kernel and IKEv2 service, underscoring the effectiveness of the agentic system, particularly in detecting complex vulnerabilities that single-model approaches often miss.
CyberGym: CyberGym is a public benchmark evaluating AI agents on real-world vulnerability reproduction tasks sourced from OSS-Fuzz across numerous open-source projects. Developed by UC Berkeley researchers, it tests capabilities in identifying and exploiting bugs in production software like C/C++ codebases. Microsoft’s MDASH achieved the top leaderboard position using generally available models.
Microsoft: Microsoft is a multinational technology corporation developing operating systems like Windows, cloud platforms such as Azure, and comprehensive security solutions including Defender suites. The company has recently emphasized AI integration in cybersecurity, launching agentic tools for threat detection and vulnerability management. In this development, Microsoft announced codename MDASH, a multi-model agentic system used to discover vulnerabilities patched in the May Patch Tuesday release.
StorageDrive: StorageDrive is a private sample kernel-mode device driver used by Microsoft for offensive security researcher interviews and tool evaluation. Designed with injected vulnerabilities such as use-after-frees and locking issues, it serves as unseen test data to avoid model training contamination. MDASH demonstrated perfect detection on all planted issues in StorageDrive.
codename MDASH: Codename MDASH is Microsoft Security’s agentic vulnerability scanning harness that orchestrates specialized AI agents across diverse frontier and distilled models for end-to-end bug discovery, validation, and proof. Its model-agnostic pipeline incorporates plugins for domain-specific reasoning on proprietary codebases. Announced on May 12, MDASH enabled discovery of Windows networking and authentication flaws now available in private preview for customers.
Microsoft Autonomous Code Security: Microsoft Autonomous Code Security (ACS) is an internal team dedicated to transforming AI-powered vulnerability research into production-scale security auditing for Microsoft’s proprietary codebases. Drawing members from Team Atlanta, winners of DARPA’s AI Cyber Challenge, ACS focuses on engineering robust harnesses around frontier AI models. ACS built and deployed codename MDASH to identify exploitable bugs in Windows components ahead of Patch Tuesday.
Microsoft Windows Attack Research and Protection: Microsoft Windows Attack Research and Protection (WARP) specializes in advanced offensive security research targeting the most critical Windows attack surfaces, including kernel and networking stacks. WARP collaborates closely with AI-driven discovery teams to validate and contextualize findings for real-world exploitation potential. In collaboration with ACS, WARP contributed to vulnerabilities fixed via MDASH in the recent Patch Tuesday.
CyberGym Design: CyberGym focuses AI evaluation on historical production vulnerabilities from fuzzing efforts spanning multiple years across diverse software ecosystems.
DARPA Expertise: ACS incorporates talent from Team Atlanta, which won DARPA’s AI Cyber Challenge by developing autonomous systems for vulnerability detection and patching in complex open-source projects.
WARP Collaboration: WARP provides deep offensive research expertise, pairing with AI discovery pipelines to ensure findings translate to actionable Patch Tuesday remediations.
