Skip to main content
Cloud / AWS / Products / Amazon SageMaker Lakehouse - Open Lakehouse

Amazon SageMaker Lakehouse - Open Lakehouse

Amazon SageMaker Lakehouse unifies S3 data lakes and Redshift warehouses on a single data copy, queryable in place via Apache Iceberg.

Analytics
Pricing Model Pay-per-use, billed via underlying components
Availability Multiple regions incl. EU (Ireland, Frankfurt, Stockholm, London)
Data Sovereignty EU regions available
Reliability N/A (depends on underlying services) SLA

What is Amazon SageMaker Lakehouse?

Amazon SageMaker Lakehouse is an open, unified lakehouse architecture that brings together Amazon S3 data lakes (including S3 Tables) and Amazon Redshift warehouses on a single copy of data. Analytics and AI/ML workloads access the same data without moving or duplicating it.

The lakehouse addresses the classic problem of separate data silos: data lakes and data warehouses often evolve in parallel, leading to redundant copies, ETL pipelines, and inconsistent permissions. SageMaker Lakehouse builds on the open Apache Iceberg standard and exposes Iceberg-compatible APIs, so any Iceberg-compatible engine can query the data in place.

Core Features

  • Unified data foundation: Brings together S3 data lakes (including S3 Tables) and Redshift warehouses so that analytics and AI/ML run on a single copy of data with no data movement.
  • Open Apache Iceberg standard: Iceberg-compatible APIs allow queries with Amazon EMR, AWS Glue, Amazon Athena, Amazon Redshift, Apache Spark, and compatible third-party tools directly in place.
  • Fine-grained access control: Centralized permissions at the table, column, row, and cell level via tag-based, attribute-based, or role-based policies, applied consistently across all engines through AWS Lake Formation and the AWS Glue Data Catalog.
  • Zero-ETL and federation: Additional data arrives via zero-ETL integrations from operational databases and applications, query federation to external sources, and catalog federation for remote Apache Iceberg tables.

Typical Use Cases

Unifying data lake and warehouse: Bring existing S3 data lakes and Redshift warehouses together without copying data or mirroring it through ETL pipelines. Teams work on a consistent copy of data.

Cross-engine analytics: Query the same data with different engines such as EMR, Glue, Athena, Redshift, or Apache Spark in place, depending on workload and team, without maintaining separate copies.

Governance across engines: Define fine-grained permissions once in Lake Formation and enforce them consistently across all accessing engines, down to the column, row, and cell level.

Benefits

  • A single copy of data for analytics and AI/ML instead of redundant copies and ETL pipelines
  • Open Apache Iceberg standard avoids lock-in and enables free choice of engine
  • Consistent, fine-grained access control across all engines
  • Usage-based billing with no upfront cost

Integration with innFactory

As an AWS Reseller, innFactory supports you with the adoption and operation of this service.

Typical Use Cases

Analytics and AI/ML on a single copy of data
Unifying S3 data lakes and Redshift warehouses
Cross-engine queries via Apache Iceberg
Centralized, fine-grained access control via Lake Formation

Frequently Asked Questions

What is Amazon SageMaker Lakehouse?

Amazon SageMaker Lakehouse is an open, unified lakehouse architecture built on Apache Iceberg. It brings together Amazon S3 data lakes (including S3 Tables) and Amazon Redshift warehouses so that analytics and AI/ML run on a single copy of data, with no data movement or duplication required.

When should I use Amazon SageMaker Lakehouse?

Use it when data currently sits in separate S3 data lakes and Redshift warehouses and you want to unify those silos without ETL copies. It also fits when multiple engines such as EMR, Glue, Athena, Redshift, or Apache Spark need to query the same data in place with consistent, fine-grained access control.

How much does Amazon SageMaker Lakehouse cost?

Billing is pay-as-you-go with no upfront cost, charged through the underlying components: AWS Glue Data Catalog for metadata storage and API requests (with a free tier), S3 or Redshift Managed Storage for storage and compute, and automated statistics and Iceberg table maintenance. Actual costs depend on usage.

Which query engines does SageMaker Lakehouse support?

Because SageMaker Lakehouse is built on the open Apache Iceberg standard and exposes Iceberg-compatible APIs, any Iceberg-compatible engine can query the data in place. This includes Amazon EMR, AWS Glue, Amazon Athena, Amazon Redshift, Apache Spark, and compatible third-party tools.

AWS Cloud Expertise

innFactory is an AWS Reseller with certified cloud architects. We provide consulting, implementation, and managed services for AWS.

Similar Products from Other Clouds

Other cloud providers offer comparable services in this category. As a multi-cloud partner, we help you choose the right solution.

Google Cloud

BigQuery data canvas - Visual Data Analysis

BigQuery data canvas: Gemini-powered visual, natural-language workspace in BigQuery Studio to find, query and visualize …

Pricing Pay-per-use (Gemini in BigQuery) plus …
SLA N/A (feature within BigQuery; the BigQuery SLA applies to the underlying service)
Compare →
Google Cloud

BigQuery Studio - Workspace for Data and AI

BigQuery Studio: unified workspace in BigQuery for SQL, Python, notebooks and Serverless Spark with Gemini assistance …

Pricing No separate charge for the workspace; …
SLA N/A (billed via the underlying BigQuery resources)
Compare →
Azure

Fabric IQ - Semantic Layer

Fabric IQ is the business semantic layer in Microsoft Fabric: ontology, graph, semantic models and data agents give …

Pricing Pay-as-you-go via Fabric Capacity Units …
SLA N/A (Preview)
Compare →
Azure

Microsoft Fabric: Unified Analytics on Azure

Microsoft Fabric unifies Data Factory, lakehouse, warehouse, Real-Time Intelligence and Power BI as SaaS on OneLake. …

Pricing Capacity Units (F-SKUs): pay-as-you-go …
SLA 99.9%
Compare →
Google Cloud

Dataflow - Managed Stream and Batch Processing

Fully managed service for stream and batch data processing based on Apache Beam. Serverless with automatic scaling.

Pricing Pay-per-use (vCPU, RAM, Storage)
SLA 99.9% availability
Compare →
Google Cloud

BigQuery ML - Machine Learning with SQL

BigQuery ML enables training and running ML models directly in BigQuery using SQL, without moving data.

Pricing Pay-per-use
SLA 99.9%
Compare →

39 comparable products found across other clouds.

Ready to start with Amazon SageMaker Lakehouse - Open Lakehouse?

Our certified AWS experts help you with architecture, integration, and optimization.

Schedule Consultation