Amazon SageMaker Lakehouse - Open Lakehouse · innFactory

What is Amazon SageMaker Lakehouse?

Amazon SageMaker Lakehouse is an open, unified lakehouse architecture that brings together Amazon S3 data lakes (including S3 Tables) and Amazon Redshift warehouses on a single copy of data. Analytics and AI/ML workloads access the same data without moving or duplicating it.

The lakehouse addresses the classic problem of separate data silos: data lakes and data warehouses often evolve in parallel, leading to redundant copies, ETL pipelines, and inconsistent permissions. SageMaker Lakehouse builds on the open Apache Iceberg standard and exposes Iceberg-compatible APIs, so any Iceberg-compatible engine can query the data in place.

Core Features

Unified data foundation: Brings together S3 data lakes (including S3 Tables) and Redshift warehouses so that analytics and AI/ML run on a single copy of data with no data movement.
Open Apache Iceberg standard: Iceberg-compatible APIs allow queries with Amazon EMR, AWS Glue, Amazon Athena, Amazon Redshift, Apache Spark, and compatible third-party tools directly in place.
Fine-grained access control: Centralized permissions at the table, column, row, and cell level via tag-based, attribute-based, or role-based policies, applied consistently across all engines through AWS Lake Formation and the AWS Glue Data Catalog.
Zero-ETL and federation: Additional data arrives via zero-ETL integrations from operational databases and applications, query federation to external sources, and catalog federation for remote Apache Iceberg tables.

Typical Use Cases

Unifying data lake and warehouse: Bring existing S3 data lakes and Redshift warehouses together without copying data or mirroring it through ETL pipelines. Teams work on a consistent copy of data.

Cross-engine analytics: Query the same data with different engines such as EMR, Glue, Athena, Redshift, or Apache Spark in place, depending on workload and team, without maintaining separate copies.

Governance across engines: Define fine-grained permissions once in Lake Formation and enforce them consistently across all accessing engines, down to the column, row, and cell level.

Benefits

A single copy of data for analytics and AI/ML instead of redundant copies and ETL pipelines
Open Apache Iceberg standard avoids lock-in and enables free choice of engine
Consistent, fine-grained access control across all engines
Usage-based billing with no upfront cost

Integration with innFactory

As an AWS Reseller, innFactory supports you with the adoption and operation of this service.

Frequently Asked Questions

What is Amazon SageMaker Lakehouse?

Amazon SageMaker Lakehouse is an open, unified lakehouse architecture built on Apache Iceberg. It brings together Amazon S3 data lakes (including S3 Tables) and Amazon Redshift warehouses so that analytics and AI/ML run on a single copy of data, with no data movement or duplication required.

When should I use Amazon SageMaker Lakehouse?

Use it when data currently sits in separate S3 data lakes and Redshift warehouses and you want to unify those silos without ETL copies. It also fits when multiple engines such as EMR, Glue, Athena, Redshift, or Apache Spark need to query the same data in place with consistent, fine-grained access control.

How much does Amazon SageMaker Lakehouse cost?

Billing is pay-as-you-go with no upfront cost, charged through the underlying components: AWS Glue Data Catalog for metadata storage and API requests (with a free tier), S3 or Redshift Managed Storage for storage and compute, and automated statistics and Iceberg table maintenance. Actual costs depend on usage.

Which query engines does SageMaker Lakehouse support?

Because SageMaker Lakehouse is built on the open Apache Iceberg standard and exposes Iceberg-compatible APIs, any Iceberg-compatible engine can query the data in place. This includes Amazon EMR, AWS Glue, Amazon Athena, Amazon Redshift, Apache Spark, and compatible third-party tools.

Amazon SageMaker Lakehouse - Open Lakehouse

What is Amazon SageMaker Lakehouse?

Core Features

Typical Use Cases

Benefits

Integration with innFactory

Typical Use Cases

Frequently Asked Questions

What is Amazon SageMaker Lakehouse?

When should I use Amazon SageMaker Lakehouse?

How much does Amazon SageMaker Lakehouse cost?

Which query engines does SageMaker Lakehouse support?

Quick Links

AWS Cloud Expertise

Similar Products from Other Clouds

Azure Analysis Services: BI Data Modeling

Azure Data Explorer - Real-Time Log & Telemetry Analytics

Azure Data Factory - Cloud ETL and Data Integration

Azure Data Lake Storage Gen2 - Scalable Data Lake

Azure Data Manager for Energy - OSDU Data Platform

Azure Data Share - Secure Data Sharing

Ready to start with Amazon SageMaker Lakehouse - Open Lakehouse?