Sakana AI’s Tree Quest: Deploy multi-model teams that outperform individual LLMs by 30%

Sakana AI’s Tree Quest: Deploy multi-model teams that outperform individual LLMs by 30%

Sakana AI, a pioneering Japanese AI research lab, has unveiled an innovative approach that allows multiple large language models (LLMs) to collaboratively tackle complex tasks. This technique, known as Multi-LLM AB-MCTS, effectively transforms various LLMs into a cohesive team, capable of outperforming individual models by a remarkable 30%. This method harnesses the distinct strengths of different AI models, enabling them to engage in trial-and-error processes that surpass the capabilities of any single model. Businesses can now dynamically select the most proficient models for specific aspects of a task, leading to more effective and versatile AI systems. Instead of being limited to a single AI provider, companies can leverage the unique attributes of various frontier models, which have been rapidly evolving but each carry their individual strengths and weaknesses. The researchers at Sakana AI emphasize that the diversity among models is not a limitation but a valuable asset. They assert that just as human achievements are often the result of diverse teams, AI can similarly reach new heights by working together. "Pooling their intelligence allows AI systems to tackle challenges that no single model could overcome," they noted in their official blog. The new algorithm employs a technique known as inference-time scaling, which has gained traction in recent months. While much of the AI focus has been on expanding model size and training datasets, this approach enhances performance post-training by allocating additional computational resources. Sakana AI's strategy integrates reinforcement learning and repeated sampling methods, refining existing solutions while also exploring new ones. At the core of Multi-LLM AB-MCTS lies the Adaptive Branching Monte Carlo Tree Search (AB-MCTS) algorithm. This sophisticated method balances two search strategies: deepening existing solutions and generating new ones. By utilizing probability models, AB-MCTS intelligently determines the most effective course of action at each step. The team rigorously tested their system against the challenging ARC-AGI-2 benchmark, known for assessing human-like visual reasoning abilities. By combining various frontier models, they achieved correct solutions for over 30% of the test problems, significantly outperforming any individual model. The system demonstrated a remarkable capacity to assign the most suitable model for each task, often discovering solutions previously deemed unattainable. In one instance, after an initial incorrect solution was produced, the system effectively utilized other models to analyze and correct the error, showcasing its collaborative potential. The researchers highlighted that this ensemble approach could mitigate common issues such as the models' tendency to hallucinate, which is particularly critical in business applications. To facilitate the adoption of this revolutionary technique, Sakana AI has made the foundational algorithm open-source under the Apache 2.0 license. This framework, named Tree Quest, offers a flexible API that allows developers to implement Multi-LLM AB-MCTS tailored to their specific needs. As they continue to explore practical applications, the team sees promising potential in various domains, including algorithmic coding and optimizing machine learning model accuracy. The introduction of this open-source tool could be a significant step toward more robust and reliable enterprise AI solutions.

Sources : VentureBeat

Published On : Jul 08, 2025, 05:38

Science
Urgent Response Needed as Ebola Outbreak Surges in DRC

The Ebola outbreak originating from the Ituri province in the Democratic Republic of the Congo is witnessing alarming gr...

Ars Technica | May 22, 2026, 22:25
Urgent Response Needed as Ebola Outbreak Surges in DRC
Media
FCC Seeks Public Input on Whether ABC’s The View Qualifies as a News Show

The Federal Communications Commission (FCC) is intensifying its scrutiny of ABC’s talk show, The View, by launching a pu...

Ars Technica | May 22, 2026, 21:15
FCC Seeks Public Input on Whether ABC’s The View Qualifies as a News Show
Startups
Reddit Faces New Competition as Meta Unveils Forum App

Shares of Reddit took a hit on Friday, dropping approximately 6% amid growing concerns over Meta's newly launched app, F...

CNBC | May 22, 2026, 18:45
Reddit Faces New Competition as Meta Unveils Forum App
Cybersecurity
Texas Attorney General Takes Legal Action Against Meta Over WhatsApp Encryption Claims

The Attorney General of Texas has initiated a lawsuit against Meta, alleging that the company’s popular messaging servic...

Ars Technica | May 22, 2026, 18:15
Texas Attorney General Takes Legal Action Against Meta Over WhatsApp Encryption Claims
Space
SpaceX's Starship V3 Launch Marks Milestone Despite Booster Setback

SpaceX has successfully launched its upgraded Starship V3 rocket for the first time, although the test included some cha...

TechCrunch | May 22, 2026, 23:10
SpaceX's Starship V3 Launch Marks Milestone Despite Booster Setback
View All News