Petri, an open-source alignment tool, has been donated to Meridian Labs to ensure its continued independent development. This donation aims to maintain the neutrality and credibility of alignment auditing by separating tool development from commercial AI labs, which is vital for credible oversight. Along with the donation, a significant update to Petri has been released, introducing enhancements such as customizable model architectures and realistic deployment simulations, which improve the depth and realism of its tests. Petri integrates into Meridian Labs’ ecosystem, working alongside tools like Inspect AI for model testing.

Petri: Petri is an open-source toolbox of alignment tests that simulates multi-turn scenarios to evaluate large language models for misaligned behaviors like deception, sycophancy, and harmful cooperation using auditor and judge models. The recent version 3.0 update enhances its adaptability through separation of auditor and target models, realism with the ‘Dish’ add-on incorporating real system prompts, and depth via integration with Bloom for targeted assessments. In this announcement, Anthropic donates Petri to Meridian Labs so its development continues independently.
Meridian Labs: Meridian Labs is a 501(c)(3) non-profit organization that develops an open-source platform for understanding, evaluating, and testing AI models and agents, with projects like Inspect AI for systematic LLM evaluations and Inspect Scout for analyzing agent transcripts. In the news, it receives the donation of Anthropic’s open-source alignment tool Petri to enable its independent development and maintenance under a neutral entity. The organization collaborates with groups such as the UK AISI on AI evaluation tools.

{“Tool Ecosystem”: “Petri integrates into Meridian Labs’ open stack alongside Inspect AI for model testing.”, “Donation Purpose”: “Transfer to nonprofit Meridian Labs maintains neutrality and credibility in alignment auditing by separating tool development from commercial AI labs.”, “Update Improvements”: “Petri 3.0 features customizable model architectures and deeper behavioral evaluations.”}