In a recent evaluation, Claude, an AI model, was tested on its ability to analyze real biological data using the BioMysteryBench benchmark, which poses 99 complex bioinformatics questions. Remarkably, Claude managed to solve approximately 30% of the problems that stumped a panel of expert biologists, demonstrating significant improvements in its scientific capabilities. BioMysteryBench stands out for using verifiable data properties as ground truth, allowing for innovative question generation and a more accurate assessment of a model’s problem-solving creativity. This evaluation underlines how AI is increasingly becoming a valuable ally in bioinformatics research, a sentiment echoed by the concurrent release of Genentech and Roche’s CompBioBench, which also highlights Claude’s effectiveness in computational biology tasks.
Claude: Claude is Anthropic’s family of large language models designed to be reliable, interpretable, and steerable for complex tasks including scientific analysis. In the recent BioMysteryBench evaluation published on April 29, 2026, Claude models analyzed real-world biological datasets, demonstrating capabilities to solve open-ended bioinformatics problems often on par with or beyond human experts by using strategies like intuitive pattern recognition and layering multiple analytical methods.
Brianna: Brianna Chrisman is a research scientist at Anthropic working at the interface of machine learning and biology, and a member of the discovery team. She authored the April 29, 2026, Science Blog post detailing the development and results of BioMysteryBench, where Claude was evaluated on real biological data analysis tasks.
Anthropic: Anthropic is an AI safety and research company that develops the Claude family of large language models with a focus on supporting professional-level scientific work. They created and released BioMysteryBench on April 29, 2026, to benchmark Claude’s bioinformatics capabilities against human experts, highlighting rapid improvements in AI’s scientific reasoning.
BioMysteryBench: BioMysteryBench is a bioinformatics benchmark developed by Anthropic consisting of expert-written questions derived from real-world datasets such as DNA/RNA sequencing, proteomics, and metabolomics data. It tests AI models’ ability to devise creative solutions to messy, open-ended research problems in a controlled environment with access to bioinformatics tools and databases, emphasizing method-agnostic evaluation based on objective ground-truth answers.
AI Strategies: Claude employs vast internalized knowledge from scientific literature and combines multiple evidence lines to solve uncertain problems, diverging from typical human approaches.
Benchmark Innovation: BioMysteryBench addresses evaluation challenges in biology by using verifiable data properties for ground truth, enabling superhuman question generation and creative problem-solving assessment.
Convergent Benchmarks: Genentech and Roche’s CompBioBench, released around the same time, independently validates Claude’s effectiveness in computational biology tasks involving multi-step reasoning and real-world resources.
