What is STACKIT AI Model Serving?
STACKIT AI Model Serving is a fully managed platform for GDPR-compliant access to leading open-weight Large Language Models. The service provides models such as Llama 3.3, Gemma 3, GPT-OSS, and Qwen3 through a unified, OpenAI-compatible API. All prompts and responses are processed in German data centers (region eu01). STACKIT does not store customer data and does not train the models on your requests. You get generative AI with full data sovereignty and no vendor lock-in.
The service has been available since May 2025 and is continuously extended with current models. Billing is pay-as-you-go based on consumed tokens, so you can start without fixed instance costs.
Core Features
- Access to open-weight models: Llama 3.3 70B, Gemma 3 27B, GPT-OSS 120B and 20B, Qwen3-VL 235B, and the Qwen3.6 27B coding model
- OpenAI-compatible REST API with chat completions and embeddings endpoints for fast integration
- Context windows up to 262K tokens and large response generation depending on the model
- Tool calling and reasoning for agentic workflows, plus vision support (up to 3 images per request)
- Text and multimodal embedding models for semantic search and cross-modal retrieval
- No storage of customer data and no training on your data, operated in the eu01 region
Typical Use Cases
Chatbots and virtual assistants: Customer service bots with natural conversation for support, consulting, and FAQ answering, fully operated in the EU.
Retrieval Augmented Generation (RAG): Connect your own knowledge bases via embedding models and cross-modal retrieval, for example with the open-source STACKIT RAG template.
Document analysis: Process contracts, reports, and legal documents with automatic summarization and extraction.
Code generation and review: Support software development with the Qwen3.6 27B coding model for generation, debugging, and technical reasoning.
Benefits
- Full data sovereignty: processing in German data centers, no data storage, and no training on your data
- Vendor independence through open models and open interfaces instead of proprietary lock-in
- Easy migration: existing OpenAI integrations work via the compatible API with minimal changes
- Predictable costs thanks to pay-as-you-go token billing without fixed instance fees
- GDPR and EU AI Act compliance as a foundation for regulated industries
Integration with innFactory
As an official STACKIT partner, innFactory supports you with AI Model Serving across the full lifecycle: architecture and model selection, migration of existing OpenAI applications, building RAG pipelines, secure operations, and cost optimization. This brings sovereign AI into production quickly and compliantly.
Available Tiers & Options
Chat and reasoning models
- Llama 3.3, Gemma 3, GPT-OSS, Qwen3
- Tool calling and reasoning
- Context windows up to 262K tokens
- Open-weight models, no proprietary models
Embedding and vision models
- Text and multimodal embeddings
- Image understanding (up to 3 images per request)
- Cross-modal retrieval for RAG
- No fine-tuning of shared models
Typical Use Cases
Technical Specifications
Frequently Asked Questions
Which AI models are available?
STACKIT provides open-weight models through a unified API, including Llama 3.3 70B, Gemma 3 27B, GPT-OSS 120B and 20B, plus Qwen3-VL 235B and the Qwen3.6 27B coding model. For embeddings it offers E5 Mistral 7B and a multimodal Qwen3-VL embedding model.
Are proprietary models like GPT-4, Claude, or Gemini available?
No. STACKIT AI Model Serving deliberately relies on open-weight models hosted in German data centers. This keeps you vendor-independent and preserves full data sovereignty.
Is my data stored or used for training?
No. STACKIT does not store customer data from requests and does not train the models on your data. Prompts and responses never leave German jurisdiction.
Is the API OpenAI-compatible?
Yes. The service offers an OpenAI-compatible REST API with /v1/chat/completions and /v1/embeddings endpoints. Existing OpenAI code works with minimal changes: you only adjust the base URL and API token. The API is stateless, so you send the conversation history with each request.
How is billing handled?
Usage is pay-as-you-go based on consumed input and output tokens, depending on the selected model. There are no fixed instance costs for the shared models.
Where are the models hosted?
The models run in the eu01 region on the data-sovereign STACKIT Cloud in Germany. The base URL of the OpenAI-compatible API is https://api.openai-compat.model-serving.eu01.onstackit.cloud/v1.
