STACKIT AI Model Serving: Sovereign LLMs

What is STACKIT AI Model Serving?

STACKIT AI Model Serving is a fully managed platform for GDPR-compliant access to leading open-weight Large Language Models. The service provides models such as Llama 3.3, Gemma 3, GPT-OSS, and Qwen3 through a unified, OpenAI-compatible API. All prompts and responses are processed in German data centers (region eu01). STACKIT does not store customer data and does not train the models on your requests. You get generative AI with full data sovereignty and no vendor lock-in.

The service has been available since May 2025 and is continuously extended with current models. Billing is pay-as-you-go based on consumed tokens, so you can start without fixed instance costs.

Core Features

Access to open-weight models: Llama 3.3 70B, Gemma 3 27B, GPT-OSS 120B and 20B, Qwen3-VL 235B, and the Qwen3.6 27B coding model
OpenAI-compatible REST API with chat completions and embeddings endpoints for fast integration
Context windows up to 262K tokens and large response generation depending on the model
Tool calling and reasoning for agentic workflows, plus vision support (up to 3 images per request)
Text and multimodal embedding models for semantic search and cross-modal retrieval
No storage of customer data and no training on your data, operated in the eu01 region

Typical Use Cases

Chatbots and virtual assistants: Customer service bots with natural conversation for support, consulting, and FAQ answering, fully operated in the EU.

Retrieval Augmented Generation (RAG): Connect your own knowledge bases via embedding models and cross-modal retrieval, for example with the open-source STACKIT RAG template.

Document analysis: Process contracts, reports, and legal documents with automatic summarization and extraction.

Code generation and review: Support software development with the Qwen3.6 27B coding model for generation, debugging, and technical reasoning.

Benefits

Full data sovereignty: processing in German data centers, no data storage, and no training on your data
Vendor independence through open models and open interfaces instead of proprietary lock-in
Easy migration: existing OpenAI integrations work via the compatible API with minimal changes
Predictable costs thanks to pay-as-you-go token billing without fixed instance fees
GDPR and EU AI Act compliance as a foundation for regulated industries

Integration with innFactory

As an official STACKIT partner, innFactory supports you with AI Model Serving across the full lifecycle: architecture and model selection, migration of existing OpenAI applications, building RAG pipelines, secure operations, and cost optimization. This brings sovereign AI into production quickly and compliantly.

Available Tiers & Options

Recommended

Chat and reasoning models

Strengths

Llama 3.3, Gemma 3, GPT-OSS, Qwen3
Tool calling and reasoning
Context windows up to 262K tokens

Considerations

Open-weight models, no proprietary models

Embedding and vision models

Strengths

Text and multimodal embeddings
Image understanding (up to 3 images per request)
Cross-modal retrieval for RAG

Considerations

No fine-tuning of shared models

Technical Specifications

API OpenAI-compatible REST API (eu01)

Capabilities Tool calling, reasoning, vision, embeddings

Compliance GDPR, EU AI Act

Context window Up to 262K tokens (model-dependent)

Data residency Processed in Germany (region eu01)

Models Llama 3.3 70B, Gemma 3 27B, GPT-OSS 120B/20B, Qwen3-VL 235B, Qwen3.6 27B, embeddings

Frequently Asked Questions

Which AI models are available?

STACKIT provides open-weight models through a unified API, including Llama 3.3 70B, Gemma 3 27B, GPT-OSS 120B and 20B, plus Qwen3-VL 235B and the Qwen3.6 27B coding model. For embeddings it offers E5 Mistral 7B and a multimodal Qwen3-VL embedding model.

Are proprietary models like GPT-4, Claude, or Gemini available?

No. STACKIT AI Model Serving deliberately relies on open-weight models hosted in German data centers. This keeps you vendor-independent and preserves full data sovereignty.

Is my data stored or used for training?

No. STACKIT does not store customer data from requests and does not train the models on your data. Prompts and responses never leave German jurisdiction.

Is the API OpenAI-compatible?

Yes. The service offers an OpenAI-compatible REST API with /v1/chat/completions and /v1/embeddings endpoints. Existing OpenAI code works with minimal changes: you only adjust the base URL and API token. The API is stateless, so you send the conversation history with each request.

How is billing handled?

Usage is pay-as-you-go based on consumed input and output tokens, depending on the selected model. There are no fixed instance costs for the shared models.

Where are the models hosted?

The models run in the eu01 region on the data-sovereign STACKIT Cloud in Germany. The base URL of the OpenAI-compatible API is https://api.openai-compat.model-serving.eu01.onstackit.cloud/v1.

STACKIT AI Model Serving: Sovereign LLMs

What is STACKIT AI Model Serving?

Core Features

Typical Use Cases

Benefits

Integration with innFactory

Available Tiers & Options

Chat and reasoning models

Embedding and vision models

Typical Use Cases

Technical Specifications

Frequently Asked Questions

Which AI models are available?

Are proprietary models like GPT-4, Claude, or Gemini available?

Is my data stored or used for training?

Is the API OpenAI-compatible?

How is billing handled?

Where are the models hosted?

Quick Links

STACKIT Partner

Comparable Products from Other Clouds

Amazon SageMaker AI: Managed ML Platform on AWS

Gemini Enterprise Agent Platform (formerly Vertex AI)

Azure Machine Learning - ML Platform

Ready to start with STACKIT AI Model Serving: Sovereign LLMs?