New 1.5B router model achieves 93% accuracy without costly retraining

New 1.5B router model achieves 93% accuracy without costly retraining

Researchers at Katanemo Labs have unveiled a groundbreaking routing model known as Arch-Router, designed to intelligently align user queries with the most appropriate large language model (LLM). As enterprises increasingly rely on multiple LLMs for various applications, Arch-Router addresses the critical challenge of efficiently directing queries without the need for rigid logic or expensive retraining whenever updates are required. With the rapid proliferation of LLMs, developers are transitioning from single-model approaches to multi-model systems that leverage the unique capabilities of each model for specific tasks, such as code generation, text summarization, or image editing. LLM routing has become essential in constructing and deploying these systems, functioning like a traffic controller to route user inquiries to the optimal model. Current routing techniques fall into two main categories: task-based routing, which relies on predefined tasks, and performance-based routing, aimed at striking a balance between cost and performance. However, task-based methods often struggle with ambiguous user intentions, particularly during multi-turn conversations. In contrast, performance-based routing tends to prioritize benchmark scores at the expense of real-world user preferences, adapting poorly to new models without costly fine-tuning. The team at Katanemo Labs highlights the shortcomings of existing routing methods, emphasizing that these approaches typically focus on benchmark performance while overlooking subjective human preferences. They advocate for routing systems that align with user-defined preferences, ensuring transparency and adaptability as both models and use cases evolve. To tackle these challenges, the researchers propose a novel “preference-aligned routing” framework that connects user queries to routing policies based on individual preferences. Users can articulate their routing policies in natural language using a two-level hierarchy known as the Domain-Action Taxonomy, which begins with a broad domain (like “legal” or “finance”) and narrows down to specific tasks (such as “summarization” or “code generation”). Each policy is then associated with a preferred model, enabling developers to make routing decisions grounded in practical needs rather than merely benchmark performance. The routing process unfolds in two phases. First, the preference-aligned router model assesses the user query alongside the complete set of policies to select the most fitting policy. Next, a mapping function links the chosen policy to its designated LLM. This separation of model selection logic from policy allows for the easy addition, removal, or replacement of models simply by modifying the routing policies, eliminating the need for retraining. Arch-Router, a compact 1.5B parameter model fine-tuned for preference-aligned routing, plays a pivotal role in this framework. It processes the user query and all policy descriptions to generate the identifier of the most suitable policy. Since policies form part of the input, the system can adapt to new or modified routes during inference through in-context learning, without requiring retraining. Addressing concerns about potential latency caused by extensive policies, the researchers ensured that Arch-Router remains efficient. Co-author Salman Paracha, the Founder and CEO of Katanemo Labs, noted that while routing policy length can increase, the context window can be expanded with minimal latency impact, as the output is simply a brief policy name. To develop Arch-Router, the researchers fine-tuned a 1.5B parameter version of the Qwen 2.5 model on a curated dataset of 43,000 examples. Their evaluations against leading proprietary models from OpenAI, Anthropic, and Google revealed that Arch-Router achieved an impressive routing score of 93.17%, outpacing competitors by an average of 7.71%, particularly excelling in longer conversations by effectively maintaining context. In practical applications, various scenarios are already benefiting from Arch-Router. Developers utilize it in open-source coding tools to manage different workflow stages, directing tasks like “code design,” “code understanding,” and “code generation” to the most suitable LLMs. Enterprises can similarly route document creation requests to Claude 3.7 Sonnet while assigning image editing tasks to Gemini 2.5 Pro. Paracha mentioned that Arch-Router is also advantageous for personal assistants handling diverse tasks ranging from text summarization to fact-checking. This innovative framework is integrated with Arch, Katanemo Labs’ AI-native proxy server for agents, allowing developers to implement complex traffic-shaping rules. For instance, when introducing a new LLM, teams can initially route a small portion of traffic to the new model for evaluation before fully transitioning, ensuring confidence in performance. Ultimately, Katanemo Labs aims to break free from isolated AI implementations. "Arch-Router—and the broader Arch framework—facilitates the shift from fragmented LLM systems to a cohesive, policy-driven architecture," Paracha asserts. "By addressing diverse user tasks, our framework transforms task and LLM fragmentation into a seamless experience for the end user."

Sources : VentureBeat

Published On : Jul 08, 2025, 21:50

Startups
Giga CEO Accuses Ex-Employees of $3 Million Cryptocurrency Extortion Scheme

The San Francisco-based AI startup Giga is currently embroiled in a serious controversy following allegations from its c...

Business Today | Dec 28, 2025, 12:30
Giga CEO Accuses Ex-Employees of $3 Million Cryptocurrency Extortion Scheme
Gadgets
Reinvigorated by the Google Pixel Watch 4: A Smartwatch Experience Worth Noting

After several years of relying on either an analog watch or a basic fitness tracker, I found myself hesitant to embrace ...

TechCrunch | Dec 28, 2025, 16:40
Reinvigorated by the Google Pixel Watch 4: A Smartwatch Experience Worth Noting
Startups
Innovative Startups Shine at TechCrunch's Disrupt Battlefield

TechCrunch's annual Startup Battlefield has once again captivated the tech world, showcasing a remarkable array of start...

TechCrunch | Dec 28, 2025, 15:30
Innovative Startups Shine at TechCrunch's Disrupt Battlefield
Cybersecurity
Federal Court Halts Deportation of Hate Speech Researcher Targeted by U.S. Government

A federal judge has issued a temporary injunction preventing the Trump administration from detaining or deporting Imran ...

TechCrunch | Dec 27, 2025, 20:20
Federal Court Halts Deportation of Hate Speech Researcher Targeted by U.S. Government
Startups
Navigating the Challenges of Securing Series A Funding in Today’s Landscape

Securing Series A funding has become increasingly complex in today's shifting market, where the stakes have risen and in...

TechCrunch | Dec 27, 2025, 19:25
Navigating the Challenges of Securing Series A Funding in Today’s Landscape
View All News