Skip to main content
Cloud / STACKIT / Products / STACKIT AI Model Serving: Sovereign LLMs

STACKIT AI Model Serving: Sovereign LLMs

STACKIT AI Model Serving: Run open-weight LLMs like Llama, Qwen, and GPT-OSS GDPR-compliant from German data centers, OpenAI-compatible.

Data & AI
Pricing Model Pay-as-you-go per token (input/output)
Availability Region eu01 (Germany)
Data Sovereignty No data storage, no training on customer data
Reliability Runs on the data-sovereign STACKIT Cloud SLA

What is STACKIT AI Model Serving?

STACKIT AI Model Serving is a fully managed platform for GDPR-compliant access to leading open-weight Large Language Models. The service provides models such as Llama 3.3, Gemma 3, GPT-OSS, and Qwen3 through a unified, OpenAI-compatible API. All prompts and responses are processed in German data centers (region eu01). STACKIT does not store customer data and does not train the models on your requests. You get generative AI with full data sovereignty and no vendor lock-in.

The service has been available since May 2025 and is continuously extended with current models. Billing is pay-as-you-go based on consumed tokens, so you can start without fixed instance costs.

Core Features

  • Access to open-weight models: Llama 3.3 70B, Gemma 3 27B, GPT-OSS 120B and 20B, Qwen3-VL 235B, and the Qwen3.6 27B coding model
  • OpenAI-compatible REST API with chat completions and embeddings endpoints for fast integration
  • Context windows up to 262K tokens and large response generation depending on the model
  • Tool calling and reasoning for agentic workflows, plus vision support (up to 3 images per request)
  • Text and multimodal embedding models for semantic search and cross-modal retrieval
  • No storage of customer data and no training on your data, operated in the eu01 region

Typical Use Cases

Chatbots and virtual assistants: Customer service bots with natural conversation for support, consulting, and FAQ answering, fully operated in the EU.

Retrieval Augmented Generation (RAG): Connect your own knowledge bases via embedding models and cross-modal retrieval, for example with the open-source STACKIT RAG template.

Document analysis: Process contracts, reports, and legal documents with automatic summarization and extraction.

Code generation and review: Support software development with the Qwen3.6 27B coding model for generation, debugging, and technical reasoning.

Benefits

  • Full data sovereignty: processing in German data centers, no data storage, and no training on your data
  • Vendor independence through open models and open interfaces instead of proprietary lock-in
  • Easy migration: existing OpenAI integrations work via the compatible API with minimal changes
  • Predictable costs thanks to pay-as-you-go token billing without fixed instance fees
  • GDPR and EU AI Act compliance as a foundation for regulated industries

Integration with innFactory

As an official STACKIT partner, innFactory supports you with AI Model Serving across the full lifecycle: architecture and model selection, migration of existing OpenAI applications, building RAG pipelines, secure operations, and cost optimization. This brings sovereign AI into production quickly and compliantly.

Available Tiers & Options

Embedding and vision models

Strengths
  • Text and multimodal embeddings
  • Image understanding (up to 3 images per request)
  • Cross-modal retrieval for RAG
Considerations
  • No fine-tuning of shared models

Typical Use Cases

Chatbots and virtual assistants
Retrieval Augmented Generation (RAG)
Document analysis and summarization
Code generation and review

Technical Specifications

API OpenAI-compatible REST API (eu01)
Capabilities Tool calling, reasoning, vision, embeddings
Compliance GDPR, EU AI Act
Context window Up to 262K tokens (model-dependent)
Data residency Processed in Germany (region eu01)
Models Llama 3.3 70B, Gemma 3 27B, GPT-OSS 120B/20B, Qwen3-VL 235B, Qwen3.6 27B, embeddings

Frequently Asked Questions

Which AI models are available?

STACKIT provides open-weight models through a unified API, including Llama 3.3 70B, Gemma 3 27B, GPT-OSS 120B and 20B, plus Qwen3-VL 235B and the Qwen3.6 27B coding model. For embeddings it offers E5 Mistral 7B and a multimodal Qwen3-VL embedding model.

Are proprietary models like GPT-4, Claude, or Gemini available?

No. STACKIT AI Model Serving deliberately relies on open-weight models hosted in German data centers. This keeps you vendor-independent and preserves full data sovereignty.

Is my data stored or used for training?

No. STACKIT does not store customer data from requests and does not train the models on your data. Prompts and responses never leave German jurisdiction.

Is the API OpenAI-compatible?

Yes. The service offers an OpenAI-compatible REST API with /v1/chat/completions and /v1/embeddings endpoints. Existing OpenAI code works with minimal changes: you only adjust the base URL and API token. The API is stateless, so you send the conversation history with each request.

How is billing handled?

Usage is pay-as-you-go based on consumed input and output tokens, depending on the selected model. There are no fixed instance costs for the shared models.

Where are the models hosted?

The models run in the eu01 region on the data-sovereign STACKIT Cloud in Germany. The base URL of the OpenAI-compatible API is https://api.openai-compat.model-serving.eu01.onstackit.cloud/v1.

STACKIT Partner

innFactory is an official STACKIT Partner. We provide consulting, implementation, and managed services for the sovereign cloud.

STACKIT Official Partner

Ready to start with STACKIT AI Model Serving: Sovereign LLMs?

Our certified STACKIT experts help you with architecture, integration, and optimization.

Schedule Consultation