Skip to main content
Cloud / Azure / Products / Foundry Local - On-Device Local AI Runtime

Foundry Local - On-Device Local AI Runtime

Foundry Local: cross-platform AI runtime that runs models on-device via ONNX Runtime. OpenAI-compatible API, no cloud, no latency, no per-token cost.

ai-machine-learning
Pricing Model Free, no per-token cost
Availability Runs on-device, worldwide incl. EU, no Azure region required
Data Sovereignty Data never leaves the device
Reliability N/A (local execution, no service SLA) SLA

What is Foundry Local?

Foundry Local is an end-to-end solution from Microsoft for building AI applications that run entirely on the user’s device. The local AI runtime handles model acquisition, hardware acceleration, model management, and inference via ONNX Runtime. The runtime adds only about 20 MB to the application and runs inference in-process. This lets you embed AI where package size, data privacy, and offline capability matter.

Foundry Local solves a concrete problem: it lets you add AI features to client applications without sending data to the cloud, without network latency, and without per-token costs. No Azure subscription is required. Responses start immediately, and the application works offline. For inference on your own infrastructure at enterprise scale with Kubernetes-native operations, Microsoft offers a separate option, Foundry Local on Azure Local.

Key Features

  • Lightweight on-device runtime: The runtime, built on ONNX Runtime, handles model acquisition, hardware acceleration, and inference within the application process and adds only about 20 MB to the app package.
  • OpenAI-compatible API: Foundry Local supports OpenAI request and response formats, including the OpenAI Responses API format, so existing OpenAI SDK applications can be reused with minimal code changes.
  • Automatic hardware acceleration: Foundry Local detects the available hardware and selects the best execution provider across CPU, GPU, and NPU, with seamless fallback to CPU. Execution provider and driver updates are managed automatically.
  • Curated model catalog: A versioned catalog of quantized models optimized for on-device use covers chat completions (such as GPT OSS, Qwen, DeepSeek, Mistral, Phi) and audio transcription (such as Whisper). Models download on first use and are cached locally.

Typical Use Cases

Embed AI in client applications: Developers integrate AI features directly into desktop applications through the SDK for C#, JavaScript, Python, or Rust. Inference runs within the application process, with no separate backend and no cloud dependency.

Process sensitive data on the device: Applications process audio, text, or images locally so the data never leaves the device. This suits scenarios with strict data-privacy and compliance requirements.

Offline and edge scenarios: In environments with limited or no connectivity, Foundry Local delivers AI features without network access. After the initial model download, no connection is required for inference itself.

Benefits

  • No per-token costs and no Azure subscription required.
  • Data stays on the device, with responses starting immediately and no network latency.
  • Cross-platform support for Windows, macOS (Apple silicon), and Linux, with automatic selection of the best hardware.

Integration with innFactory

As a Microsoft Solutions Partner, innFactory supports you with the adoption and operation of this service.

Typical Use Cases

Embed AI features in desktop apps without a cloud backend
Process sensitive data (audio, text, images) directly on the device
Offline and edge scenarios with limited connectivity
Reduce per-token costs of cloud-based inference

Frequently Asked Questions

What is Foundry Local?

Foundry Local is an end-to-end solution from Microsoft for running AI models entirely on the user's device. It combines a lightweight runtime built on ONNX Runtime, a curated model catalog, and an SDK for C#, JavaScript, Python, and Rust. Data never leaves the device, and there are no per-token costs.

When should I use Foundry Local?

Foundry Local fits when sensitive data must stay on the device, when applications need to work offline or in limited-connectivity environments, when you need low latency for real-time interactions, or when you want to reduce the per-token costs of cloud-based inference. It is designed for single-user scenarios on client devices.

How much does Foundry Local cost?

Foundry Local has no per-token cost and requires no Azure subscription. The models run entirely on local hardware. Use is governed by the product terms and licenses that apply to the software and the models in use.

Which platforms does Foundry Local support, and is it OpenAI-compatible?

Foundry Local supports Windows, macOS (Apple silicon), and Linux, and automatically uses the best available hardware (CPU, GPU, or NPU). The runtime exposes an OpenAI-compatible API, including the OpenAI Responses API format. Existing OpenAI SDK applications can be repointed to a Foundry Local endpoint with minimal changes.

Microsoft Solutions Partner

innFactory is a Microsoft Solutions Partner. We provide expert consulting, implementation, and managed services for Azure.

Microsoft Solutions Partner Microsoft Data & AI

Similar Products from Other Clouds

Other cloud providers offer comparable services in this category. As a multi-cloud partner, we help you choose the right solution.

74 comparable products found across other clouds.

Ready to start with Foundry Local - On-Device Local AI Runtime?

Our certified Azure experts help you with architecture, integration, and optimization.

Schedule Consultation