Amazon S3 Metadata - Queryable Metadata · innFactory

What is Amazon S3 Metadata?

Amazon S3 Metadata automatically captures metadata for objects in S3 general purpose buckets and exposes it as queryable, read-only Apache Iceberg tables. These metadata tables refresh in near real time as objects are added, updated, or removed. Instead of maintaining your own inventory systems or scripts, you query object metadata directly with SQL and accelerate data discovery across large estates.

S3 Metadata solves a concrete problem: in buckets with millions or billions of objects, finding objects by creation time, storage class, tags, or encryption status is difficult without separate indexing. The metadata tables provide exactly this queryable view and integrate seamlessly with AWS analytics services through Amazon SageMaker Lakehouse, AWS Glue Data Catalog, and AWS Lake Formation.

Core Features

Journal table: The required journal table records change events such as uploads, updates, deletes, and lifecycle transitions in near real time, starting from the moment you create the configuration. It is queryable, so you can audit changes with simple SQL queries.
Live inventory table: The optional live inventory table provides a queryable snapshot of all objects and versions in the bucket. It requires an initial backfill scan (minimum 15 minutes), after which updates are typically reflected within one hour.
Annotation table: The optional annotation table makes annotations on objects directly queryable, with one row per annotation and object version.
Three metadata categories: It captures system-defined metadata (creation time, storage class, size), custom metadata (object tags and user-defined metadata), and event metadata (who changed what and when, via the requesting AWS account).
Managed Iceberg tables: The tables live in an AWS managed S3 table bucket, are read-only, and are maintained automatically (compaction, unreferenced file removal). Encryption defaults to SSE-S3, with optional SSE-KMS.

Typical Use Cases

Data discovery and analytics: Quickly find the relevant objects in large buckets and analyze data estates directly with SQL via Athena, EMR, or Redshift, without building separate inventory pipelines.

AI/ML training data preparation: Select the exact objects to include in training or processing by tags, storage class, or metadata, and speed up data preparation.

Audit and change tracking: Use the journal table to trace which objects were added, updated, or deleted, and by which AWS account. This supports compliance and governance requirements.

Advantages

Queryable object metadata without your own indexing or inventory infrastructure
Near real-time metadata updates as objects change
Open Apache Iceberg format and integration with AWS analytics services and Iceberg-compatible engines
Available in EU regions for GDPR-compliant scenarios

Integration with innFactory

As an AWS Reseller, innFactory supports you with the adoption and operation of this service.

Frequently Asked Questions

What is Amazon S3 Metadata?

Amazon S3 Metadata automatically captures metadata for objects in S3 general purpose buckets and stores it in read-only, fully managed Apache Iceberg tables. These metadata tables refresh in near real time as objects are added, updated, or removed. You can query the tables with standard SQL to find and analyze your data.

When should I use Amazon S3 Metadata?

Amazon S3 Metadata fits when you need to quickly locate objects in large buckets, analyze data estates, or track changes. Concrete scenarios include preparing AI/ML training data, auditing uploads, updates, and deletes, and identifying objects by storage class, tags, or encryption status for compliance and cost optimization.

How much does Amazon S3 Metadata cost?

Amazon S3 Metadata uses a pay-per-use model. You pay for the change events recorded in the journal table plus table storage. If you enable the optional live inventory table, you are charged for the initial backfill scan, and buckets with more than one billion objects incur a monthly fee for the live inventory table. See the S3 pricing page for current rates.

Which query engines and analytics services are supported?

The metadata tables can be queried directly with AWS analytics services such as Amazon Athena, Amazon EMR, and Amazon Redshift through the integration with Amazon SageMaker Lakehouse, and you can build dashboards in Amazon Quick Suite (the successor to Amazon QuickSight). Access and governance run through AWS Glue Data Catalog and AWS Lake Formation. Because the tables use the Apache Iceberg format, Apache Spark, Apache Trino, and other Iceberg-compatible engines also work via the Glue or S3 Tables Iceberg REST endpoints.

What is the annotation table in S3 Metadata?

The optional annotation table tracks the latest annotations on objects in a bucket and makes their content directly queryable. Each row represents one annotation on a specific object version, letting you find objects by annotation value and correlate annotations across objects.

Amazon S3 Metadata - Queryable Metadata

What is Amazon S3 Metadata?

Core Features

Typical Use Cases

Advantages

Integration with innFactory

Typical Use Cases

Frequently Asked Questions

What is Amazon S3 Metadata?

When should I use Amazon S3 Metadata?

How much does Amazon S3 Metadata cost?

Which query engines and analytics services are supported?

What is the annotation table in S3 Metadata?

Quick Links

AWS Cloud Expertise

Similar Products from Other Clouds

Azure Archive Storage - Long-term Storage

Azure Blob Storage - Object Storage

Azure Disk Storage - Managed Block Storage

Azure Elastic SAN - Block Storage in a SAN Model

Azure Files - Managed File Shares

Azure Managed Lustre - High-Performance File System

Ready to start with Amazon S3 Metadata - Queryable Metadata?