In a recent evaluation by AISI, a newer Mythos Preview checkpoint demonstrated improved capabilities in cyber tasks, successfully completing two cyber ranges designed to test AI models’ ability to execute sustained cyberattacks on undefended networks. Notably, this iteration solved the previously unsolved challenge “Cooling Tower” in 3 of 10 attempts, marking the first completion of both ranges by any model. This advancement reflects broader trends, as recent frontier AI models have shown a consistent doubling time of 4.2 months for completing software tasks, indicating that progress in cyber capabilities is accelerating without necessarily requiring new model releases.

AISI: AISI, the AI Security Institute, is a UK government organization that conducts scientific research on frontier AI risks, focusing on cybersecurity evaluations and mitigations. It develops cyber ranges to test AI models’ abilities in simulating real-world cyberattacks on enterprise networks. In recent evaluations shared via their blog and X thread, AISI highlighted rapid advances in AI cyber autonomy using a newer Mythos Preview checkpoint.
METR: METR is an independent research non-profit dedicated to AI model evaluation and threat research, specializing in benchmarks for time horizons on software engineering tasks. It analyzes how long AI can autonomously handle complex tasks before requiring intervention. AISI’s latest cyber doubling time estimates align closely with METR’s findings on broader software capabilities.
AI Security Institute: The AI Security Institute conducts scientific research to assess frontier AI’s most serious risks, particularly in cyber domains, and develops corresponding mitigations. It runs specialized cyber ranges measuring AI performance on sustained cyberattacks post-initial access. Their recent blog post details breakthroughs by models like Mythos Preview and GPT-5.5 on previously unsolved cyber challenges.

Progress Acceleration: Recent frontier models exceed prior trends in cyber task length doubling rates, matching related software engineering benchmarks.
Model Iteration Impact: Later checkpoints of existing models drive significant capability jumps without needing full new releases.
Cyber Evaluation Advances: AISI’s cyber ranges test sustained AI planning and execution on enterprise networks, revealing first-time completions of advanced challenges like Cooling Tower.