Sakana AI has developed the RL Conductor, a small language model that orchestrates communication and task delegation among advanced AI models like GPT-5, Claude Sonnet 4, and Gemini 2.5 Pro, utilizing reinforcement learning instead of rigid, hardcoded pipelines. This innovation not only enhances efficiency—achieving benchmark scores with six times fewer tokens and fewer processing steps than conventional frameworks—but also allows for more dynamic responses to varied user demands in complex environments. The RL Conductor now serves as the underpinning for Sakana’s commercial product, Fugu, which offers multi-agent orchestration solutions to industries such as finance, effectively addressing challenges associated with traditional AI integration.
Fugu: Fugu is Sakana AI’s commercial multi-agent orchestration service based on RL Conductor technology, offered in beta through an OpenAI-compatible API with Mini and Ultra variants for different workloads. It targets enterprises in finance and defense needing robust generalization across heterogeneous tasks like software development and deep research. The system automates complex agent collaboration while maintaining interpretability and guardrails against hallucinations.
GPT-5: GPT-5 is a frontier large language model from OpenAI, serving as one of the closed-source worker LLMs in Sakana AI’s RL Conductor pool. The Conductor often deploys it for final code optimization in complex coding workflows after planning by other models. It contributes complementary strengths in tasks like code generation within multi-agent setups.
Sakana AI: Sakana AI is a Japanese AI research lab focused on developing efficient and innovative language model architectures, including multi-agent orchestration systems. They recently introduced RL Conductor, a 7B parameter model trained via reinforcement learning to dynamically coordinate frontier LLMs like GPT-5, Claude Sonnet 4, and Gemini 2.5 Pro, outperforming hardcoded frameworks on reasoning and coding benchmarks. This technology powers their commercial product Fugu, now in beta as an OpenAI-compatible API for enterprise applications in software development, research, and strategy.
Yujin Tang: Yujin Tang is a researcher and co-author at Sakana AI, contributing to the RL Conductor paper and providing insights on the limitations of hardcoded agentic frameworks. She highlighted how production environments with diverse user demands require dynamic orchestration beyond tools like LangChain. Tang also discussed Fugu’s enterprise applications and future potential in cross-modal AI systems.
RL Conductor: RL Conductor is a 7-billion parameter language model fine-tuned from Qwen2.5-7B using reinforcement learning to automatically design and manage workflows for pools of worker LLMs. It analyzes inputs, delegates subtasks, and optimizes communication without hardcoded pipelines, achieving state-of-the-art results on benchmarks like AIME25 math and LiveCodeBench. In tests, it used significantly fewer tokens than competing multi-agent systems while surpassing individual frontier models.
Gemini 2.5 Pro: Gemini 2.5 Pro is Google’s closed-source frontier LLM integrated into Sakana AI’s worker pool for RL Conductor. It excels in high-level planning and sometimes takes over full workflow design, enabling efficient multi-step strategies on challenging benchmarks. The Conductor leverages its strengths for tasks requiring scientific reasoning and task delegation.
Claude Sonnet 4: Claude Sonnet 4 is Anthropic’s advanced Claude model series iteration, used as a high-level planner in Sakana AI’s RL Conductor experiments. The system frequently assigns it to initial planning phases alongside Gemini 2.5 Pro for reasoning and coding benchmarks. Its specialization enhances the overall performance of dynamic agent orchestration.
`json
{
“Efficiency”: “The system achieves top benchmark scores with significantly fewer tokens and fewer steps compared to other frameworks like Mixture-of-Agents.”,
“Innovation”: “RL Conductor eliminates hardcoded pipelines by utilizing reinforcement learning to dynamically route tasks across specialized language models.”,
“Enterprise Use”: “Fugu employs Conductor technology for practical tasks in software development, research, and strategy, particularly in sectors like finance.”
}
`
