Azure AI Foundry on Microsoft Azure
What is Azure AI Foundry?
Azure AI Foundry is Microsoft’s comprehensive platform for developing, evaluating, and deploying AI applications. The platform unifies the complete lifecycle of AI projects in a single environment: from model selection through prompt engineering to deployment and monitoring. This solves a central problem in enterprise AI projects: the fragmentation between development, testing, and production.
The core of Azure AI Foundry is the Model Catalog, which provides access to over 50 AI models. This includes OpenAI models like GPT-4, GPT-4o, and GPT-4 Turbo, open-source models like Meta Llama 3.1, Mistral Large, and Phi-3, as well as specialized models for vision, audio, and code. This diversity enables companies to choose the optimal model for their specific use case without being locked into a single provider. Models can be directly compared, evaluated, and fine-tuned with custom data.
The platform offers Prompt Flow, a low-code tool for orchestrating complex AI workflows. Prompt Flow visualizes data flow between prompts, models, and external tools, significantly simplifying the development of RAG (Retrieval-Augmented Generation) applications. The Responsible AI Dashboard integrates Content Safety filters, fairness analyses, and hallucination detection directly into the development process. This helps companies build AI applications that not only work but also meet ethical and regulatory standards.
Typical Use Cases
Chatbot Development with GPT-4
Companies develop intelligent customer service bots with Azure AI Foundry based on GPT-4 or GPT-4o. Conversation flows are orchestrated via Prompt Flow, enterprise data is integrated in real-time through Azure AI Search (RAG). The Responsible AI filter prevents inappropriate responses. Typical scenario: Support chatbot for SaaS companies that searches Confluence and Jira data.
Document Intelligence with RAG
Financial services and insurance companies use AI Foundry for automated processing of contracts, policies, and regulatory documents. The RAG implementation combines Azure AI Search for semantic search with GPT-4 for natural language summaries. The Evaluation Framework measures Groundedness and Hallucination Rate to ensure factually correct answers. Result: 70% time savings in document review.
Code Generation with Copilot Patterns
Software companies implement internal code assistants following the GitHub Copilot pattern. With fine-tuning on their own codebase, the model learns company-specific patterns and libraries. Prompt Flow orchestrates code generation, testing, and review processes. PTUs (Provisioned Throughput Units) guarantee consistent latency for developer workflows.
Content Moderation for User-Generated Content
Social media platforms and community applications use Azure AI Content Safety APIs from AI Foundry for automated moderation. The models detect hate speech, sexual content, violence, and self-harm in text and images. Custom categories enable industry-specific filters (e.g., unlicensed financial advice). Integration via API into existing content pipelines.
Multi-modal AI Applications
Marketing teams develop tools that combine text, images, and audio. A typical workflow: GPT-4 Vision analyzes product images, generates descriptions, and creates social media posts. DALL-E 3 generates image variations. Azure Speech Services convert text to audio for podcasts. Everything orchestrated via Prompt Flow, evaluated with custom metrics.
Fine-tuning Open-Source Models
Companies with highly specialized use cases use Llama 3.1 or Mistral as a base and train them with their own data. AI Foundry offers managed fine-tuning: upload training data, automatic training, evaluation against baseline models. Result: Domain-specific models with better performance than GPT-4 at lower costs.
Enterprise Search with Semantic Search
Large organizations implement intelligent search systems across internal knowledge bases (SharePoint, OneDrive, internal wikis). Azure AI Search indexes documents with embeddings, GPT-4 generates answers based on the most relevant documents. Evaluation metrics measure relevance and citation accuracy. Users receive not only answers but also source references for compliance.
Best Practices for Azure AI Foundry
Model Selection from the Catalog
Start with clear evaluation criteria: latency, cost, quality. Use the AI Foundry Playground to directly compare GPT-4, GPT-4o, Llama 3.1, and Mistral Large. For production-ready decisions: Create an evaluation dataset with 100-200 representative examples and measure groundedness, relevance, and coherence. GPT-4o often offers the best compromise between quality and price.
Prompt Engineering with Prompt Flow
Develop prompts not in isolation but as workflows in Prompt Flow. Use variant testing to systematically compare different prompt formulations. Implement chain-of-thought reasoning for complex tasks. Version your flows in Git. Best practice: Separate system prompts (behavior) from user prompts (input) and use few-shot examples for consistent outputs.
RAG Implementation with Azure AI Search
For production-ready RAG applications: Use Hybrid Search (keyword + semantic) in Azure AI Search for better retrieval quality. Implement reranking with cross-encoder models. Measure retrieval precision and answer groundedness separately. Systematically test chunk size and overlap (recommendation: 500-1000 tokens with 10-20% overlap). Use citation metadata to make sources traceable.
Responsible AI Practices
Integrate Content Safety filters from the start, not retroactively. Define custom categories for your use case (e.g., “financial advice” or “medical diagnoses”). Use the Responsible AI Dashboard for continuous monitoring. Implement human-in-the-loop for critical decisions. Document model cards for transparency and compliance.
Evaluation and Monitoring
Establish an evaluation framework from day one: Define metrics (groundedness, relevance, coherence, fluency), create golden datasets, automate evaluations with every deployment. In production: Log all inputs/outputs, sample random requests for human review, track latency and error rates. AI Foundry offers built-in evaluation tools for GPT-based metrics.
Cost Optimization with PTUs vs. Token-based Pricing
For workloads with predictable traffic: PTUs (Provisioned Throughput Units) offer up to 50% cost savings compared to pay-per-token. Calculate break-even: At more than 1 million tokens per day, PTUs are worthwhile. For variable workloads: Use token-based pricing with rate limiting. Combine both models: PTUs for baseline traffic, token-based for peaks.
Security with Managed Identity and Private Endpoints
Avoid API keys in code. Use Azure Managed Identity for authentication between services. Implement Private Endpoints for AI Foundry to keep traffic within the Azure network. Activate Customer-Managed Keys for encryption at rest. For regulated industries: Use EU regions with data residency guarantees and GDPR compliance.
Frequently Asked Questions about Azure AI Foundry
What is the difference between Azure AI Foundry and Azure OpenAI Service?
Azure OpenAI Service provides access specifically to OpenAI models (GPT-4, DALL-E, Embeddings). Azure AI Foundry is the overarching platform that includes Azure OpenAI Service but also offers open-source models (Llama, Mistral), Prompt Flow, evaluation tools, and Responsible AI Dashboard. AI Foundry is the modern development environment for all AI models, Azure OpenAI Service is a component of it.
Which models are available in the Model Catalog?
The catalog currently includes over 50 models: OpenAI (GPT-4, GPT-4o, GPT-4 Turbo, GPT-3.5, DALL-E 3, Whisper), Meta Llama (Llama 3.1 8B, 70B, 405B), Mistral (Mistral Large, Mistral Small), Microsoft Phi-3, Cohere Command and Embed, as well as vision models like Florence. Models are continuously added. All models can be tested directly in the playground.
How does Prompt Flow work and what do I need it for?
Prompt Flow is a visual tool for orchestrating AI workflows. Instead of writing code, you connect nodes (LLM calls, Python code, databases, APIs) in a flowchart. Use cases: RAG pipelines (Retrieval → Reranking → LLM → Post-Processing), multi-step reasoning, agent workflows with tool calls. The advantage: Workflows are versionable, testable with different inputs, and directly deployable as REST API.
How do I implement RAG (Retrieval-Augmented Generation) with AI Foundry?
The typical RAG flow: 1) Index documents in Azure AI Search with embeddings. 2) In Prompt Flow, create a flow: User Query → Azure AI Search (Retrieval) → Reranking → Top-K Documents → LLM with Context → Answer. 3) Evaluate with groundedness metrics (measures if answer is based on documents). AI Foundry offers templates for RAG patterns. Integration with SharePoint, OneDrive, and Blob Storage is available out-of-the-box.
Can I fine-tune models with my own data?
Yes, AI Foundry supports fine-tuning for GPT-3.5, GPT-4, Llama 3.1, Mistral, and Phi-3. You upload training data (JSONL format with prompt/completion pairs), AI Foundry trains the model and provides it as a private deployment. Use cases: Domain-specific language (legal, medical), consistent brand tone-of-voice, optimization for specific tasks. Fine-tuning costs per training hour plus hosting of the custom model.
What does Azure AI Foundry cost: PTUs vs. Token-based Pricing?
There are two pricing models: 1) Token-based: Pay-per-token (input/output), e.g., GPT-4o approximately $2.50 per 1M input tokens. Flexible for variable workloads. 2) PTUs (Provisioned Throughput Units): Monthly fixed costs for guaranteed throughput. 1 PTU ≈ 100-150K tokens/hour. Break-even at approximately 1M tokens/day. PTUs offer predictable costs and lower latency. Combinable: PTUs for baseline, token-based for peaks.
How does content filtering and Responsible AI work?
AI Foundry integrates Azure AI Content Safety: Filters for hate speech, sexual content, violence, self-harm in 4 severity levels. You configure thresholds per use case. Custom categories enable industry-specific filters. The Responsible AI Dashboard shows fairness metrics (bias detection), groundedness (hallucination detection), and error analysis. Filters are active in real-time (input + output filtering) and log violations for compliance.
Where is my data stored and is AI Foundry GDPR compliant?
AI Foundry is available in all Azure regions, including EU regions (West Europe, North Europe, Germany West Central). When choosing an EU region, data remains in the EU (data residency). Microsoft offers Data Processing Agreements for GDPR compliance. Important: Azure OpenAI Service in AI Foundry does not store prompts/completions for Microsoft training (unlike OpenAI API). For maximum control: Customer-Managed Keys for encryption.
How do I measure the quality of my AI application?
AI Foundry offers built-in evaluation metrics: Groundedness (is answer based on facts?), Relevance (does answer fit the question?), Coherence (is answer logical?), Fluency (is language natural?), Similarity (to reference answers). For custom metrics: Use GPT-4 as judge (evaluates outputs according to your criteria). Best practice: Create a golden dataset with 100-200 examples, evaluate with every deployment, track metrics over time.
Can I use AI Foundry on-premises or in hybrid scenarios?
AI Foundry is a cloud service and runs on Azure. For hybrid scenarios: You can connect on-premises data via Azure Arc, VPN, or ExpressRoute. RAG implementations can integrate local data sources (e.g., on-prem SharePoint via connector). Model inference occurs in Azure. For fully on-premises: Microsoft offers selected models via Azure Stack, but not the complete AI Foundry platform.
Integration with innFactory
As a Microsoft Solutions Partner, innFactory supports you in implementing Azure AI Foundry into your existing infrastructure. We help with model selection, develop custom RAG solutions, implement Responsible AI practices, and optimize costs through the right combination of PTUs and token-based pricing.
Our expertise includes integrating AI Foundry with Azure AI Search, Azure Machine Learning, and your enterprise data, as well as developing Prompt Flow workflows for complex AI applications.
Contact us for a non-binding consultation on Azure AI Foundry and enterprise AI projects.
