New 1.5B router model achieves 93% accuracy without costly retraining

New 1.5B router model achieves 93% accuracy without costly retraining

Researchers at Katanemo Labs have unveiled a groundbreaking routing model known as Arch-Router, designed to intelligently align user queries with the most appropriate large language model (LLM). As enterprises increasingly rely on multiple LLMs for various applications, Arch-Router addresses the critical challenge of efficiently directing queries without the need for rigid logic or expensive retraining whenever updates are required. With the rapid proliferation of LLMs, developers are transitioning from single-model approaches to multi-model systems that leverage the unique capabilities of each model for specific tasks, such as code generation, text summarization, or image editing. LLM routing has become essential in constructing and deploying these systems, functioning like a traffic controller to route user inquiries to the optimal model. Current routing techniques fall into two main categories: task-based routing, which relies on predefined tasks, and performance-based routing, aimed at striking a balance between cost and performance. However, task-based methods often struggle with ambiguous user intentions, particularly during multi-turn conversations. In contrast, performance-based routing tends to prioritize benchmark scores at the expense of real-world user preferences, adapting poorly to new models without costly fine-tuning. The team at Katanemo Labs highlights the shortcomings of existing routing methods, emphasizing that these approaches typically focus on benchmark performance while overlooking subjective human preferences. They advocate for routing systems that align with user-defined preferences, ensuring transparency and adaptability as both models and use cases evolve. To tackle these challenges, the researchers propose a novel “preference-aligned routing” framework that connects user queries to routing policies based on individual preferences. Users can articulate their routing policies in natural language using a two-level hierarchy known as the Domain-Action Taxonomy, which begins with a broad domain (like “legal” or “finance”) and narrows down to specific tasks (such as “summarization” or “code generation”). Each policy is then associated with a preferred model, enabling developers to make routing decisions grounded in practical needs rather than merely benchmark performance. The routing process unfolds in two phases. First, the preference-aligned router model assesses the user query alongside the complete set of policies to select the most fitting policy. Next, a mapping function links the chosen policy to its designated LLM. This separation of model selection logic from policy allows for the easy addition, removal, or replacement of models simply by modifying the routing policies, eliminating the need for retraining. Arch-Router, a compact 1.5B parameter model fine-tuned for preference-aligned routing, plays a pivotal role in this framework. It processes the user query and all policy descriptions to generate the identifier of the most suitable policy. Since policies form part of the input, the system can adapt to new or modified routes during inference through in-context learning, without requiring retraining. Addressing concerns about potential latency caused by extensive policies, the researchers ensured that Arch-Router remains efficient. Co-author Salman Paracha, the Founder and CEO of Katanemo Labs, noted that while routing policy length can increase, the context window can be expanded with minimal latency impact, as the output is simply a brief policy name. To develop Arch-Router, the researchers fine-tuned a 1.5B parameter version of the Qwen 2.5 model on a curated dataset of 43,000 examples. Their evaluations against leading proprietary models from OpenAI, Anthropic, and Google revealed that Arch-Router achieved an impressive routing score of 93.17%, outpacing competitors by an average of 7.71%, particularly excelling in longer conversations by effectively maintaining context. In practical applications, various scenarios are already benefiting from Arch-Router. Developers utilize it in open-source coding tools to manage different workflow stages, directing tasks like “code design,” “code understanding,” and “code generation” to the most suitable LLMs. Enterprises can similarly route document creation requests to Claude 3.7 Sonnet while assigning image editing tasks to Gemini 2.5 Pro. Paracha mentioned that Arch-Router is also advantageous for personal assistants handling diverse tasks ranging from text summarization to fact-checking. This innovative framework is integrated with Arch, Katanemo Labs’ AI-native proxy server for agents, allowing developers to implement complex traffic-shaping rules. For instance, when introducing a new LLM, teams can initially route a small portion of traffic to the new model for evaluation before fully transitioning, ensuring confidence in performance. Ultimately, Katanemo Labs aims to break free from isolated AI implementations. "Arch-Router—and the broader Arch framework—facilitates the shift from fragmented LLM systems to a cohesive, policy-driven architecture," Paracha asserts. "By addressing diverse user tasks, our framework transforms task and LLM fragmentation into a seamless experience for the end user."

Sources : VentureBeat

Published On : Jul 08, 2025, 21:50

Startups
Govini Founder Arrested in Child Solicitation Sting Amid Military Contracts

Eric Gillespie, the 57-year-old founder of Govini, a defense startup based in Virginia, has been arrested on allegations...

CNBC | Nov 12, 2025, 20:50
Govini Founder Arrested in Child Solicitation Sting Amid Military Contracts
Automotive
Audi Unveils Striking Minimalist Design for Its Formula 1 Debut

In a highly anticipated reveal, Audi has showcased the design for its inaugural Formula 1 car, the R26, which is set to ...

Ars Technica | Nov 12, 2025, 19:30
Audi Unveils Striking Minimalist Design for Its Formula 1 Debut
Science
Solar Storm Delays Blue Origin's New Glenn Rocket Launch to Mars

The highly anticipated second flight of Blue Origin's New Glenn rocket has been postponed due to a powerful surge of mag...

Ars Technica | Nov 12, 2025, 20:35
Solar Storm Delays Blue Origin's New Glenn Rocket Launch to Mars
Gaming
Get Ready for an Epic Adventure: Trailer Unveiled for The Super Mario Galaxy Movie

The Super Mario Bros. Movie took the box office by storm in 2023, amassing an impressive $1.36 billion and earning multi...

Ars Technica | Nov 12, 2025, 18:50
Get Ready for an Epic Adventure: Trailer Unveiled for The Super Mario Galaxy Movie
Streaming
Exciting News: Alien Earth Set for Season 2 with Creator Noah Hawley at the Helm

FX, along with Disney+ and Hulu, has confirmed that the hit series Alien Earth will be returning for a second season. Th...

Ars Technica | Nov 12, 2025, 19:55
Exciting News: Alien Earth Set for Season 2 with Creator Noah Hawley at the Helm
View All News