Microsoft Foundry on Microsoft Azure
What is Microsoft Foundry?
Microsoft Foundry (formerly Azure AI Foundry, before that Azure AI Studio) is Microsoft’s unified platform for developing, evaluating, and operating AI applications and agents. The platform brings models, agents, and tools together under a single Azure resource and covers the full lifecycle: from model selection through prompt engineering and agent orchestration to deployment, monitoring, and governance. This solves a central problem in enterprise AI projects: the fragmentation between development, testing, and production.
The core of Microsoft Foundry is the Model Catalog with 1,900+ models from Microsoft, OpenAI, Anthropic, Mistral, xAI, Meta, DeepSeek, and Hugging Face. This includes GPT-5 for complex reasoning and multimodal scenarios, GPT-4.1 for the balance of capability and cost, Anthropic Claude, xAI Grok, Mistral, DeepSeek-R1, Microsoft Phi-4 for resource-constrained environments, and Meta Llama for custom fine-tuning. This diversity lets companies choose the right model per use case without locking into a single provider. Through the Model Router, you address all models via one endpoint, and the system routes automatically to the right model.
Beyond pure model usage, Foundry offers the Agent Service for production-ready AI agents, Foundry IQ as a citation-backed knowledge layer for RAG, and built-in observability with tracing, evaluations, and Content Safety. The Foundry Control Plane adds governance, role-based access control (RBAC), and Azure Policy integration for enterprise-wide operations.
Core Features
- Model Catalog with 1,900+ models (GPT-5, GPT-4.1, Claude, Grok, Mistral, DeepSeek-R1, Phi-4, Llama) and a Model Router over a single endpoint
- Foundry Agent Service with multi-agent orchestration, a tool catalog (over 1,400 tools), memory, and publishing to Microsoft 365 or Teams
- Foundry IQ: citation-backed knowledge layer for RAG over OneLake, Snowflake, S3, and web content
- Built-in observability: tracing, continuous evaluations, monitoring, and Azure AI Content Safety
- Flexible deployments: Standard (token-based), Provisioned Throughput (PTU), and Batch, each as Global, Data Zone, or Regional
- Foundry Control Plane for governance, RBAC, networking, and Azure Policy integration
Typical Use Cases
Production-ready AI agents
Companies use the Foundry Agent Service to build agents that call tools, retain context through memory, and orchestrate multiple specialized agents. The service handles conversations, identity, safety, and observability, so teams do not have to write a custom orchestrator. Typical scenario: a self-service agent that researches enterprise data through Foundry IQ and executes actions in downstream systems.
Document Intelligence with RAG
Financial services and insurance companies automate processing of contracts, policies, and regulatory documents. Foundry IQ combines citation-backed retrieval with GPT-5 or GPT-4.1 for natural language summaries. Built-in evaluations measure groundedness to ensure factually correct answers. This significantly accelerates review processes without sacrificing traceability.
Customer service bots
Companies develop customer service assistants based on GPT-5 or Claude. The Agent Service orchestrates conversation flows, while Foundry IQ integrates enterprise knowledge in real time. Content Safety prevents inappropriate responses. Example: a support assistant that searches Confluence and Jira data and opens tickets.
Code assistants following the Copilot pattern
Software companies implement internal code assistants. With fine-tuning on their own codebase, the model learns company-specific patterns and libraries. Provisioned Throughput (PTU) guarantees consistent latency for developer workflows, and the Agent Service orchestrates generation, testing, and review.
Content moderation for user-generated content
Community and social platforms use Azure AI Content Safety from Foundry Tools for automated moderation. The models detect hate speech, sexual content, violence, and self-harm in text and images. Custom categories enable industry-specific filters, and integration happens via API into existing content pipelines.
Multi-modal AI applications
Marketing teams combine text, images, and audio. A GPT-5 model analyzes product images, generates descriptions, and creates social media posts, image models generate variations, and speech models convert text to audio. The Agent Service orchestrates the workflow, and evaluations measure quality.
Enterprise search with semantic search
Large organizations build intelligent search across internal knowledge sources such as SharePoint, OneDrive, and OneLake. Foundry IQ returns citation-backed answers with source references for compliance. Users receive not only answers but also the evidence behind them.
Benefits
- One platform instead of fragmented tools: models, agents, and tools under a single Azure resource with unified RBAC, networking, and policies
- Provider independence: 1,900+ models directly comparable, with the Model Router automatically optimizing quality, cost, and latency
- Faster path to production: the Agent Service handles orchestration, identity, safety, and observability
- Trustworthy AI: Content Safety, evaluations, and tracing are built in, not bolted on
- Predictable costs: token, PTU, and batch deployments can be combined per workload
- EU compliance: EU Data Zone and Data Boundary with regions like Sweden Central, West Europe, and Germany West Central
Best Practices for Microsoft Foundry
Model selection from the catalog
Start with clear criteria: latency, cost, quality. Compare GPT-5, GPT-4.1, Claude, and Mistral in the playground. For production-ready decisions: create an evaluation dataset with 100 to 200 representative examples and measure groundedness, relevance, and coherence. For mixed workloads, the Model Router pays off by automatically routing to the right model.
Build agents with the Agent Service
Use the Foundry Agent Service instead of custom orchestration code. Define tools from the catalog, enable memory for multi-step tasks, and version agent versions. Use multi-agent orchestration for complex workflows and gate critical steps with human-in-the-loop.
RAG implementation with Foundry IQ
Connect data sources through Foundry IQ and use citation-backed answers to make sources traceable. Measure retrieval precision and answer groundedness separately. For deep integration with Microsoft Fabric and Microsoft 365, Fabric IQ and Work IQ are available.
Responsible AI from the start
Integrate Azure AI Content Safety from Foundry Tools from the beginning, not retroactively. Define custom categories for your use case and use observability for continuous monitoring. Document model cards for transparency and compliance.
Evaluation and monitoring
Establish an evaluation framework from day one: define metrics, create golden datasets, and automate evaluations with every deployment. In production: trace all agent steps, sample for human review, and run continuous evaluation through the observability dashboard.
Cost optimization with deployment types
For predictable traffic, Provisioned Throughput Units (PTU) offer predictable costs and low latency. For asynchronous bulk processing, use Global Batch at around 50% lower cost. For variable traffic, token-based Standard pricing fits. Combine the models: PTU for baseline, Batch for bulk jobs, Standard for peaks.
Security with managed identity and private endpoints
Avoid API keys in code and use Azure Managed Identity. Implement private endpoints to keep traffic within the Azure network, and enable customer-managed keys for encryption at rest. For regulated industries: Data Zone deployments in the EU with data residency guarantees.
Frequently Asked Questions about Microsoft Foundry
What is the difference between Microsoft Foundry and Azure OpenAI?
Azure OpenAI provides access specifically to OpenAI models like GPT-5 and GPT-4.1. Microsoft Foundry is the overarching platform: it includes Azure OpenAI as one area and adds 1,900+ models from other providers, the Foundry Agent Service, Foundry IQ, Model Router, plus tracing, evaluations, and governance. An existing Azure OpenAI resource can be upgraded to a Foundry resource while preserving its endpoint and API keys.
Which models are available in the Foundry Model Catalog?
The catalog includes 1,900+ models from Microsoft, OpenAI, Anthropic, Mistral, xAI, Meta, DeepSeek, and Hugging Face. Key families include GPT-5 for complex reasoning, GPT-4.1 for the balance of capability and cost, Anthropic Claude, xAI Grok, Mistral, DeepSeek-R1, Microsoft Phi-4, and Meta Llama. Models can be compared and evaluated directly in the playground.
What is the Foundry Agent Service?
The Foundry Agent Service moves agents from prototype to production without you writing a custom orchestrator. It handles conversations, tool calls, identity, safety, and observability. You build multi-agent orchestration with SDKs for C# and Python, connect over 1,400 tools from the catalog, and use memory to retain context across interactions. Agents can be published to Microsoft 365, Teams, or containerized deployments.
How does RAG work with Foundry IQ?
Foundry IQ is the knowledge layer of Microsoft Foundry. It connects agents to enterprise or web content and returns citation-backed answers rather than a single vector lookup. Foundry IQ taps data sources such as OneLake, Snowflake, and S3 and turns retrieval into a dynamic reasoning process. For deeper integration with Microsoft Fabric and Microsoft 365, Fabric IQ and Work IQ are available.
Can I fine-tune models with my own data?
Yes, Microsoft Foundry supports fine-tuning for many models, including OpenAI models, Llama, Mistral, and Phi. You upload training data, Foundry trains the model and provides it as a private deployment. Use cases: domain-specific language for legal or medical, consistent brand tone-of-voice, and optimization for specific tasks. Billing is per training plus hosting of the custom model.
What does Microsoft Foundry cost and which deployment types exist?
The platform itself is free; billing happens per deployment. Standard deployments are token-based (pay-per-token, input and output priced separately). Provisioned Throughput (PTU) reserves fixed capacity for predictable cost and latency. Global Batch processes asynchronous jobs with a roughly 24-hour target turnaround at around 50% lower cost than Global Standard. Each option comes as Global, Data Zone, or Regional depending on your compliance needs.
How does content filtering and Responsible AI work?
Foundry integrates Azure AI Content Safety: filters for hate speech, sexual content, violence, and self-harm across multiple severity levels. You configure thresholds per use case, and custom categories enable industry-specific filters. Observability surfaces evaluations for groundedness and quality plus tracing across all agent steps. Filters act in real time on input and output and log violations for compliance.
Where is my data processed and is Foundry GDPR compliant?
Microsoft Foundry supports EU data residency through the EU Data Zone and the EU Data Boundary. EU regions such as Sweden Central, West Europe, and Germany West Central keep data in the EU. With Data Zone deployments, processing stays within the EU. Microsoft offers Data Processing Agreements, and for Azure OpenAI in Foundry, prompts are not used to train Microsoft models. Customer-managed keys provide additional control over encryption.
Can I use Foundry on-premises or in hybrid scenarios?
Microsoft Foundry is a cloud service and runs on Azure. For hybrid scenarios, you connect on-premises data via Azure Arc, VPN, or ExpressRoute. RAG implementations can include local data sources, for example on-prem SharePoint via a connector. For local inference, Microsoft offers Foundry Local with selected models on your own devices, but not the complete Foundry platform.
Integration with innFactory
As a Microsoft Solutions Partner, innFactory supports you in implementing Microsoft Foundry into your existing infrastructure. We help with model selection, develop production-ready agents with the Foundry Agent Service, build citation-backed RAG solutions with Foundry IQ, implement Responsible AI practices, and optimize costs through the right combination of token, PTU, and batch deployments.
Our expertise includes integrating Foundry with Azure AI Search, Microsoft Fabric, and your enterprise data, as well as building governance through the Foundry Control Plane for regulated industries.
Contact us for a non-binding consultation on Microsoft Foundry and enterprise AI projects.
Technical Specifications
Frequently Asked Questions
What is the difference between Microsoft Foundry and Azure OpenAI?
Azure OpenAI provides access specifically to OpenAI models like GPT-5 and GPT-4.1. Microsoft Foundry (formerly Azure AI Foundry) is the overarching platform: it includes Azure OpenAI as one area and adds 1,900+ models from other providers, the Foundry Agent Service, Foundry IQ, Model Router, plus tracing, evaluations, and governance. An existing Azure OpenAI resource can be upgraded to a Foundry resource while preserving its endpoint and API keys.
Which models are available in the Foundry Model Catalog?
The catalog includes 1,900+ models from Microsoft, OpenAI, Anthropic, Mistral, xAI, Meta, DeepSeek, and Hugging Face. Key families include GPT-5 for complex reasoning, GPT-4.1 for the balance of capability and cost, Anthropic Claude, xAI Grok, Mistral, DeepSeek-R1, Microsoft Phi-4 for resource-constrained environments, and Meta Llama for fine-tuning. Models can be compared and evaluated directly in the playground.
What is the Foundry Agent Service?
The Foundry Agent Service moves agents from prototype to production without you writing a custom orchestrator. It handles conversations, tool calls, identity, safety, and observability. You build multi-agent orchestration with SDKs for C# and Python, connect over 1,400 tools from the catalog, and use memory to retain context across interactions. Agents can be published to Microsoft 365, Teams, or containerized deployments.
How does RAG work with Foundry IQ?
Foundry IQ is the knowledge layer of Microsoft Foundry. It connects agents to enterprise or web content and returns citation-backed answers rather than a single vector lookup. Foundry IQ taps data sources such as OneLake, Snowflake, and S3 and turns retrieval into a dynamic reasoning process. For deeper integration with Microsoft Fabric and Microsoft 365, Fabric IQ and Work IQ are available.
What does Microsoft Foundry cost and which deployment types exist?
The platform itself is free; billing happens per deployment. Standard deployments are token-based (pay-per-token, input and output priced separately). Provisioned Throughput (PTU) reserves fixed capacity for predictable cost and latency. Global Batch processes asynchronous jobs with a roughly 24-hour target turnaround at around 50% lower cost than Global Standard. Each option comes as Global, Data Zone, or Regional depending on your compliance needs.
Where is my data processed and is Foundry GDPR compliant?
Microsoft Foundry supports EU data residency through the EU Data Zone and the EU Data Boundary. EU regions such as Sweden Central, West Europe, and Germany West Central keep data in the EU. With Data Zone deployments, processing stays within the EU. Microsoft offers Data Processing Agreements, and for Azure OpenAI in Foundry, prompts are not used to train Microsoft models. Customer-managed keys provide additional control over encryption.
