RecursiveMAS boosts multi-agent AI efficiency, cuts token use by 75%

Researchers at the University of Illinois Urbana-Champaign and Stanford University have developed RecursiveMAS, a framework that significantly enhances the efficiency of multi-agent AI systems by allowing agents to communicate through embedding space rather than generating text. This innovative approach leads to a 75% reduction in token usage and a 2.4 times increase in inference speed compared to traditional methods. As recent trends in AI have concentrated on improving coordinated multi-agent workflows to tackle complex tasks, RecursiveMAS demonstrates a practical solution that addresses issues like increased latency and training costs in enterprise settings. The system, which has been evaluated across nine benchmarks in fields such as code generation and medical reasoning, produces notable accuracy improvements while minimizing compute overhead, making it a cost-effective option for scaling multi-agent systems.

Qwen: Qwen is a family of open-weight large language models developed by Alibaba Cloud, designed for general-purpose reasoning and generation across multiple languages and domains. In the RecursiveMAS experiments, Qwen instances serve as backbone models for some of the agents, showing that the framework can coordinate diverse open-source models through latent-space collaboration.
Gemma3: Gemma3 is a line of open-weight language models from Google, designed for efficient deployment and fine-tuning across a range of AI tasks. In this context, Gemma3 models are used as agents within the RecursiveMAS framework, illustrating how heterogeneous open-source models can be chained together via RecursiveLink modules for more powerful collaborative reasoning.
LoopLM: LoopLM is a recursive language model baseline that deepens reasoning by looping data through shared layers, allowing more computation without increasing parameters. The news compares RecursiveMAS against LoopLM to show how extending recursive ideas from single models to multi-agent systems, combined with embedding-based communication, yields stronger accuracy and efficiency gains.
Llama-3: Llama-3 is a generation of open-weight large language models released by Meta, widely used as a flexible foundation for customized applications and research. The news describes Llama-3 as one of the backbone models integrated into the RecursiveMAS multi-agent system, demonstrating that the framework can enhance reasoning performance without fine-tuning the core model weights.
Mistral: Mistral refers to a series of open-weight large language models developed by Mistral AI, known for their efficiency and strong performance on reasoning and coding benchmarks. Within the RecursiveMAS setup, Mistral models are assigned specific agent roles, helping validate that the recursive, embedding-based collaboration strategy works across different architectures and model sizes.
AIME2025: AIME2025 is a recent edition of the American Invitational Mathematics Examination used as a challenging benchmark for mathematical reasoning in AI systems. The news notes that RecursiveMAS outperforms text-based optimization methods on AIME2025, underscoring its strength in handling multi-step, high-difficulty math problems through collaborative agents.
AIME2026: AIME2026 is a follow-on American Invitational Mathematics Examination benchmark that tests advanced problem-solving skills and is increasingly used to evaluate cutting-edge reasoning models. In the reported results, RecursiveMAS achieves stronger performance than text-centric multi-agent approaches on AIME2026, reinforcing its claim to superior reasoning capabilities in rigorous evaluation settings.
TextGrad: TextGrad is a text-based optimization and training framework for language models that uses gradients derived from textual feedback or multi-step reasoning to improve performance. In the reported experiments, TextGrad serves as a baseline multi-agent optimization method that RecursiveMAS surpasses on demanding reasoning benchmarks, highlighting the advantages of latent-space communication over purely text-based interaction.
Apache 2.0: Apache 2.0 is a permissive open-source software license that allows broad use, modification, and distribution of code, including for commercial purposes, with relatively few restrictions. In this news, the RecursiveMAS framework and its trained model weights are released under Apache 2.0, making it easy for companies and developers to adopt and customize the system in production environments.
RecursiveMAS: RecursiveMAS is a multi-agent AI framework that lets agents communicate through continuous embeddings instead of text, significantly improving efficiency and performance for complex reasoning tasks. In this news, it is presented as a new architecture that accelerates multi-agent inference, reduces token usage, and offers a more scalable, low-cost way to deploy production-grade agent workflows under an open-source license.
RecursiveLink: RecursiveLink is a specialized module introduced with RecursiveMAS that transmits and refines latent representations between and within agents instead of forcing them to generate text. In the reported work, it acts as the connective tissue of the system, enabling continuous latent-space reasoning while keeping the underlying language models frozen and cheap to train.
Recursive-TextMAS: Recursive-TextMAS is an alternative multi-agent framework that uses the same recursive loop structure as RecursiveMAS but relies on explicit text-based communication between agents. In this article it functions as a key comparison point, with results showing that RecursiveMAS’s latent-space messaging significantly reduces token usage and speeds up inference while improving overall task accuracy.
Stanford University: Stanford University is a leading private research institution in California with a prominent role in advancing AI and machine learning research. Researchers from Stanford partnered in creating RecursiveMAS, helping design and validate the recursive multi-agent architecture and demonstrating its advantages on tasks like code generation, medical reasoning, and search.
University of Illinois Urbana-Champaign: The University of Illinois Urbana-Champaign is a major U.S. public research university known for its strong programs in computer science, engineering, and artificial intelligence. Its researchers co-developed RecursiveMAS, contributing academic expertise and experimentation that demonstrate how embedding-based multi-agent collaboration can outperform traditional text-based systems.

Agentic_AI_Trend: Recent AI research has increasingly focused on multi-agent and agentic workflows, with labs and startups exploring coordinated teams of specialized models to tackle complex, multi-step tasks that single models struggle to handle reliably.
Open_Weights_Adoption: Open-weight language models like Llama-3, Mistral, Qwen, and Gemma3 have seen growing adoption in both academia and industry as organizations seek more controllable, customizable alternatives to proprietary APIs for building advanced reasoning systems.
Enterprise_Efficiency_Focus: Enterprises experimenting with agent-based AI have become more sensitive to inference latency and token expenses, pushing interest toward architectures that minimize text generation overhead and enable cheaper, scalable training and deployment.