Anthropic’s model, Mythos, has achieved notable accuracy in the BrowseComp benchmark, scoring 85% with a budget of 1 million tokens and 87% at 3 million tokens. This benchmark measures how well AI models can browse the internet and perform research tasks. In comparison, Opus 4.6 scored 76% at 1 million tokens and 84% at 10 million tokens, highlighting Mythos’s superior efficiency, particularly in a landscape where recent models are focusing on maximizing output with fewer resources.

Mythos: Claude Mythos Preview is Anthropic’s most advanced general-purpose language model to date, emphasizing strong agentic capabilities for tasks like web research and planning. It leads on BrowseComp, a benchmark testing accurate internet browsing and synthesis of hard-to-find information. The preview release highlights its efficiency advantages over previous Anthropic models.
Opus 4.6: Claude Opus 4.6 is Anthropic’s flagship prior model known for advanced reasoning and performance in long-context web research tasks. It has been evaluated on BrowseComp where it showed capability in locating difficult online information. In recent comparisons, it is surpassed by Anthropic’s newer Mythos Preview at equivalent or lower computational costs.
Anthropic: Anthropic is an AI research company focused on developing safe and reliable frontier models in the Claude family. The company recently introduced Claude Mythos Preview, which demonstrates leadership in web browsing and research benchmarks like BrowseComp. Mythos outperforms prior models such as Opus 4.6 while using lower token budgets.

BrowseComp Benchmark: BrowseComp evaluates AI models’ accuracy in browsing the internet to find and synthesize hard-to-locate information for research tasks.
Model Efficiency Trend: Recent frontier models prioritize token efficiency to handle complex agentic workflows with fewer resources.
Mythos Preview Release: Anthropic released Claude Mythos Preview in limited access to select customers due to its frontier capabilities and efficiency improvements.