New 1.5B router model achieves 93% accuracy without costly retraining

Researchers at Katanemo Labs have unveiled a groundbreaking routing model known as Arch-Router, designed to intelligently align user queries with the most appropriate large language model (LLM). As enterprises increasingly rely on multiple LLMs for various applications, Arch-Router addresses the critical challenge of efficiently directing queries without the need for rigid logic or expensive retraining whenever updates are required. With the rapid proliferation of LLMs, developers are transitioning from single-model approaches to multi-model systems that leverage the unique capabilities of each model for specific tasks, such as code generation, text summarization, or image editing. LLM routing has become essential in constructing and deploying these systems, functioning like a traffic controller to route user inquiries to the optimal model. Current routing techniques fall into two main categories: task-based routing, which relies on predefined tasks, and performance-based routing, aimed at striking a balance between cost and performance. However, task-based methods often struggle with ambiguous user intentions, particularly during multi-turn conversations. In contrast, performance-based routing tends to prioritize benchmark scores at the expense of real-world user preferences, adapting poorly to new models without costly fine-tuning. The team at Katanemo Labs highlights the shortcomings of existing routing methods, emphasizing that these approaches typically focus on benchmark performance while overlooking subjective human preferences. They advocate for routing systems that align with user-defined preferences, ensuring transparency and adaptability as both models and use cases evolve. To tackle these challenges, the researchers propose a novel “preference-aligned routing” framework that connects user queries to routing policies based on individual preferences. Users can articulate their routing policies in natural language using a two-level hierarchy known as the Domain-Action Taxonomy, which begins with a broad domain (like “legal” or “finance”) and narrows down to specific tasks (such as “summarization” or “code generation”). Each policy is then associated with a preferred model, enabling developers to make routing decisions grounded in practical needs rather than merely benchmark performance. The routing process unfolds in two phases. First, the preference-aligned router model assesses the user query alongside the complete set of policies to select the most fitting policy. Next, a mapping function links the chosen policy to its designated LLM. This separation of model selection logic from policy allows for the easy addition, removal, or replacement of models simply by modifying the routing policies, eliminating the need for retraining. Arch-Router, a compact 1.5B parameter model fine-tuned for preference-aligned routing, plays a pivotal role in this framework. It processes the user query and all policy descriptions to generate the identifier of the most suitable policy. Since policies form part of the input, the system can adapt to new or modified routes during inference through in-context learning, without requiring retraining. Addressing concerns about potential latency caused by extensive policies, the researchers ensured that Arch-Router remains efficient. Co-author Salman Paracha, the Founder and CEO of Katanemo Labs, noted that while routing policy length can increase, the context window can be expanded with minimal latency impact, as the output is simply a brief policy name. To develop Arch-Router, the researchers fine-tuned a 1.5B parameter version of the Qwen 2.5 model on a curated dataset of 43,000 examples. Their evaluations against leading proprietary models from OpenAI, Anthropic, and Google revealed that Arch-Router achieved an impressive routing score of 93.17%, outpacing competitors by an average of 7.71%, particularly excelling in longer conversations by effectively maintaining context. In practical applications, various scenarios are already benefiting from Arch-Router. Developers utilize it in open-source coding tools to manage different workflow stages, directing tasks like “code design,” “code understanding,” and “code generation” to the most suitable LLMs. Enterprises can similarly route document creation requests to Claude 3.7 Sonnet while assigning image editing tasks to Gemini 2.5 Pro. Paracha mentioned that Arch-Router is also advantageous for personal assistants handling diverse tasks ranging from text summarization to fact-checking. This innovative framework is integrated with Arch, Katanemo Labs’ AI-native proxy server for agents, allowing developers to implement complex traffic-shaping rules. For instance, when introducing a new LLM, teams can initially route a small portion of traffic to the new model for evaluation before fully transitioning, ensuring confidence in performance. Ultimately, Katanemo Labs aims to break free from isolated AI implementations. "Arch-Router—and the broader Arch framework—facilitates the shift from fragmented LLM systems to a cohesive, policy-driven architecture," Paracha asserts. "By addressing diverse user tasks, our framework transforms task and LLM fragmentation into a seamless experience for the end user."

Sources : VentureBeat

Published On : Jul 08, 2025, 21:50

Strengthening Ties: US Ambassador Advocates for Enhanced AI Collaboration with India

During the India Today Conclave 2026, themed "The Intelligence Exchange," US Ambassador Sergio Gor emphasized the necess...

Business Today | Mar 13, 2026, 06:55

Strengthening Ties: US Ambassador Advocates for Enhanced AI Collaboration with India

Adobe's Leadership Shake-Up: CEO Shantanu Narayen Steps Down Amidst AI Revolution

In a significant shift for the company, Adobe has announced that its long-serving CEO, Shantanu Narayen, will be steppin...

Business Today | Mar 13, 2026, 03:15

Adobe's Leadership Shake-Up: CEO Shantanu Narayen Steps Down Amidst AI Revolution

Cybersecurity

Stryker Faces Cyber Assault Amid Global Tensions: What We Know

In the wake of recent airstrikes by the US and Israel on Iran, cybersecurity experts issued warnings to organizations wo...

Ars Technica | Mar 12, 2026, 22:20

Stryker Faces Cyber Assault Amid Global Tensions: What We Know

Startups

Adobe's Leadership Shake-Up: CEO Shantanu Narayen to Step Down Amid Transition

In a significant corporate shift, Adobe has announced that its CEO, Shantanu Narayen, will be stepping down once a succe...

CNBC | Mar 12, 2026, 20:25

Adobe's Leadership Shake-Up: CEO Shantanu Narayen to Step Down Amid Transition

Automotive

Elon Musk Envisions Workforce Growth at Tesla Amid AI Revolution

In a surprising twist amidst widespread layoffs across various industries, Elon Musk, CEO of Tesla, has announced plans ...

Business Insider | Mar 13, 2026, 04:25

Elon Musk Envisions Workforce Growth at Tesla Amid AI Revolution

View All News

High-quality, Cost-effective IT Outsourcing

let’s grow together!

portfolio

case study

follow us on

follow us on

New 1.5B router model achieves 93% accuracy without costly retraining

Strengthening Ties: US Ambassador Advocates for Enhanced AI Collaboration with India

Adobe's Leadership Shake-Up: CEO Shantanu Narayen Steps Down Amidst AI Revolution

Stryker Faces Cyber Assault Amid Global Tensions: What We Know

Adobe's Leadership Shake-Up: CEO Shantanu Narayen to Step Down Amid Transition

Elon Musk Envisions Workforce Growth at Tesla Amid AI Revolution

Collaborate with Benzatine Infotech

High-quality, Cost-effective IT Outsourcing

let’s grow together!

portfolios

case study

follow us on

follow us on

portfolio

case study

follow us on

follow us on

New 1.5B router model achieves 93% accuracy without costly retraining

Strengthening Ties: US Ambassador Advocates for Enhanced AI Collaboration with India

Adobe's Leadership Shake-Up: CEO Shantanu Narayen Steps Down Amidst AI Revolution

Stryker Faces Cyber Assault Amid Global Tensions: What We Know

Adobe's Leadership Shake-Up: CEO Shantanu Narayen to Step Down Amid Transition

Elon Musk Envisions Workforce Growth at Tesla Amid AI Revolution

Collaborate with Benzatine Infotech