STACKIT AI Model Experiments is a fully managed MLflow tracking server for the entire lifecycle of machine learning models and GenAI applications. The service tracks experiments, versions models and prompts, and makes the reasoning chains of LLM agents transparent: all on sovereign EU infrastructure and without complex setup. Combined with STACKIT AI Model Serving and Notebooks, it forms an end-to-end MLOps workflow.
Core Features
- Experiment Tracking: Automatic logging of parameters, metrics, data versions, and artifacts for reproducible training runs
- Model and Prompt Registry: Centralized versioning and control over which model or prompt version is active in which environment
- LLM Tracing: Traceable reasoning chains for agent-based workflows and RAG pipelines, including tool calls
- LLM-as-a-judge: Automated evaluation of non-deterministic outputs by relevance, soundness, and safety
- Audit Trail: Complete, audit-proof history of all experiments to support the EU AI Act
- Managed Service: Highly available and scalable, ready to use immediately with no maintenance overhead
Typical Use Cases
Classic ML training: Data science teams track experiments for forecasts, fraud detection, or recommendation engines and register the best models for deployment via STACKIT AI Model Serving.
GenAI and agent systems: Teams trace and debug complex reasoning chains of agent-based systems, including prompt templates and tool calls.
RAG optimization: Developers evaluate and debug retrieval-augmented generation processes to deliberately improve the response quality of their language models.
Compliance documentation: Regulated industries use the automatic history for traceable model decisions and audit evidence.
Benefits
- Sovereignty: All experiments and artifacts stay on GDPR-compliant EU infrastructure
- Open-source base: MLflow is an established standard, and artifacts remain portable
- GenAI-ready: ML tracking and LLM observability in a single service
- No vendor lock-in: Experiments and models can be migrated at any time
- Complete MLOps stack: Together with Notebooks, Workflows, and AI Model Serving
Integration with innFactory
As an official STACKIT partner, innFactory supports you in building MLOps and LLMOps pipelines: from experiment tracking through LLM tracing and evaluation to production model deployment on sovereign STACKIT infrastructure.
Typical Use Cases
Frequently Asked Questions
What is STACKIT AI Model Experiments based on?
The service is a fully managed MLflow tracking server. Existing MLflow code and the entire MLflow ecosystem work without any rework.
Does the service support GenAI and LLM applications?
Yes. Beyond classic ML tracking, it offers LLM tracing that makes the reasoning chains of agents and RAG pipelines transparent, plus prompt versioning.
How does evaluation of LLM outputs work?
With LLM-as-a-judge you automatically evaluate non-deterministic outputs against criteria such as relevance, soundness, and safety.
Does the service help with EU AI Act compliance?
Yes. All experiments, metrics, and parameters are logged completely. This audit-proof history supports audit requirements such as the EU AI Act.
Which regions is the service available in?
The service runs on sovereign STACKIT infrastructure in the regions EU-01 (Germany South) and EU-02 (Austria West).
Is there any vendor lock-in?
No. Because the service is based on open-source MLflow, experiments, artifacts, and models stay portable and can be migrated at any time.
