Model Router - Automatic LLM routing · innFactory - Software Development, Cloud & AI

What is Model Router?

Model Router is a trained language model in Microsoft Foundry that routes each prompt in real time to the most suitable large language model (LLM). You deploy Model Router like any other Foundry model and get a single deployment that bundles several LLMs behind one unified chat interface. The routing decision is based on attributes such as complexity, required reasoning and task type. Your application code does not need to change.

Model Router solves a concrete problem in running AI applications: if you send every request to the same powerful model, you also pay the highest price for trivial tasks. Model Router uses smaller and cheaper models when they are sufficient and falls back to larger or reasoning models when the task requires it. This lowers cost and latency while keeping quality comparable. Dozens of underlying models from multiple providers are currently available, including the GPT-5 series as well as Claude, Grok, DeepSeek and Llama models.

Core features

Real-time routing from one deployment: Model Router analyzes each prompt at runtime and selects the right model without storing your prompts. You manage one deployment instead of many individual model deployments.
Three routing modes plus model selection: Balanced (default) picks the most cost-effective model within a narrow quality band of roughly 1 to 2 percent. Cost widens the band to roughly 5 to 6 percent for maximum savings. Quality selects the highest-rated model and ignores cost. Model subset lets you define which models are eligible for routing at all.
Automatic failover and prompt caching: If a model has a transient issue, Model Router transparently redirects the request to the next most appropriate model. Failover is enabled by default. Prompt caching is used automatically when the selected model supports it.
Vision, tools and governance: Model Router accepts image input for vision-enabled chats but makes the routing decision based on text only. Audio input is not processed. Agentic scenarios with tools in the Foundry Agent Service are supported, and Azure Policy centrally controls which models may be included in a deployment.

Typical use cases

Cost optimization at high volume: Applications with many simple requests and occasional complex tasks benefit from Cost or Balanced mode. Trivial requests are routed to cheap models, so the budget stays reserved for the genuinely demanding tasks.

Unified interface for mixed workloads: Teams that want to cover different task types through one API, from short classifications to multi-step reasoning, get a single chat interface that selects the appropriate model for each request.

Higher availability through failover: Applications that need stable response times use the built-in automatic failover. If a model is temporarily unavailable, the next most suitable model takes over without the application having to implement that logic.

Benefits

Lower cost and reduced latency at comparable quality, because smaller models are used when they are sufficient.
Less operational overhead through a single deployment instead of many individual model deployments.
More control over cost, compliance and performance through routing modes and model subset, combined with Azure Policy governance.

Integration with innFactory

As a Microsoft Solutions Partner, innFactory supports you with the adoption and operation of this service.

Frequently Asked Questions

What is Model Router?

Model Router is a trained language model in Microsoft Foundry that analyzes each prompt in real time and routes it to the most suitable large language model. You deploy it like any other Foundry model and get a single deployment that bundles several LLMs behind one interface. Your application code stays unchanged.

When should I use Model Router?

Model Router is a good fit when your application handles tasks of varying complexity and you do not want to pay for an expensive model on every request. Simple requests go to smaller, cheaper models, while complex reasoning tasks go to more capable ones. It also helps with higher availability through automatic failover and supports agentic scenarios in the Foundry Agent Service.

How much does Model Router cost?

Usage is billed on a pay-per-use basis: you pay for input prompts at the rate of the underlying model that was selected, as listed on the pricing page. There is no separate routing fee. You can monitor your deployment costs in the Azure portal. Prompt caching reduces costs further when the selected model supports it.

Is Model Router available in the EU and how is data handled?

Model Router is available in the East US 2 and Sweden Central regions and supports the Global Standard and Data Zone Standard deployment types. With Data Zone Standard, requests stay within data zone boundaries, which enables EU data residency. Model Router does not store your prompts and only routes to models that are compatible with your access and data zone boundaries.

Model Router - Automatic LLM routing

What is Model Router?

Core features

Typical use cases

Benefits

Integration with innFactory

Typical Use Cases

Frequently Asked Questions

What is Model Router?

When should I use Model Router?

How much does Model Router cost?

Is Model Router available in the EU and how is data handled?

Quick Links

Microsoft Solutions Partner

Similar Products from Other Clouds

Agent Development Kit (ADK) - Multi-Agent Framework

Agent Search (formerly Vertex AI) - AI Enterprise Search

Agent Studio - Enterprise AI Agents (ex Agent Builder)

Agent Studio (ex Vertex AI) - Generative AI Development

Amazon Augmented AI (A2I) - Human Review for ML

Amazon Bedrock AgentCore - AI Agent Runtime

Ready to start with Model Router - Automatic LLM routing?