Dataplex - Unified Data Governance and Management · innFactory

Dataplex organizes distributed data into logical data lakes without data movement. Centralized governance, automatic metadata, and data quality monitoring.

What is Dataplex?

Dataplex is a data fabric that organizes data across Cloud Storage and BigQuery into logical structures. Instead of copying data into a central data lake, Dataplex creates virtual views over distributed data sources.

The service provides automatic metadata discovery, data quality checks, and centralized governance policies.

Concepts

Lake

Logical container for related data. Typically per business unit or project.

Zone

Grouping by processing stage:

Raw Zone: Raw data without transformation
Curated Zone: Cleansed, structured data

Asset

The actual data: Cloud Storage buckets or BigQuery datasets. Assets are assigned to zones.

Lake: Customer Analytics
├── Zone: Raw
│   ├── Asset: gs://raw-events (Cloud Storage)
│   └── Asset: gs://raw-transactions
└── Zone: Curated
    ├── Asset: bq://project.curated.events (BigQuery)
    └── Asset: bq://project.curated.customers

Core Features

Virtual organization: Data stays where it is
Auto discovery: Schema and statistics captured automatically
Data quality: Define rules and check automatically
Central policies: IAM policies at lake level
Data Catalog integration: All metadata searchable

Typical Use Cases

Data Lake Management

Organize hundreds of storage buckets and BigQuery datasets into logical lakes. Teams find data without knowing where it physically resides.

Data Quality Monitoring

Define quality rules (no null values in key fields, valid date formats) and check automatically on schedule. Alerts on violations.

Cross-Team Governance

Centralized policies for data access across multiple teams. Data owners define who can access which zones.

Automatic Documentation

Dataplex automatically captures schemas, statistics, and samples. Teams understand data without manual documentation.

Dataplex vs. Data Catalog

Feature	Dataplex	Data Catalog
Metadata search	Yes (via Data Catalog)	Yes
Data organization	Lakes, Zones, Assets	No
Data quality	Yes	No
Policies	Lake-level	Tag-based
Discovery	Automatic	Automatic

Benefits

No data copying: Virtual organization
Automatic: Discovery and profiling without manual effort
Unified: Single view of Cloud Storage and BigQuery
Governance: Centralized policies and data quality

Integration with innFactory

As a Google Cloud Partner, innFactory supports you with Dataplex: data lake design, data quality strategies, and governance framework implementation.

Frequently Asked Questions

What is Dataplex?

Dataplex is a data fabric service that organizes distributed data into logical lakes without moving it. It provides centralized governance, automatic metadata discovery, and data quality checks across Cloud Storage and BigQuery.

What's the difference between Dataplex and Data Catalog?

Data Catalog is for metadata search and tagging. Dataplex goes further by organizing data into lakes/zones, offering data quality checks, and enabling central policies. Dataplex uses Data Catalog for the metadata layer.

What are Lakes, Zones, and Assets in Dataplex?

A Lake is a logical container for related data (e.g., per business unit). Zones group assets by processing stage (Raw, Curated). Assets are the actual data in Cloud Storage buckets or BigQuery datasets.

How does data quality work in Dataplex?

Dataplex Data Quality defines rules (null values, formats, ranges, uniqueness) and checks them automatically on schedule. Results appear in Data Catalog and can trigger alerts. Auto Data Quality suggests rules based on profiling.

How much does Dataplex cost?

Dataplex charges based on Compute Units (CU) for discovery, quality scans, and processing. Discovery is relatively inexpensive; quality scans on large datasets can be more expensive. The first 30 days per lake are free.

Dataplex - Unified Data Governance and Management

What is Dataplex?

Concepts

Lake

Zone

Asset

Core Features

Typical Use Cases

Data Lake Management

Data Quality Monitoring

Cross-Team Governance

Automatic Documentation

Dataplex vs. Data Catalog

Benefits

Integration with innFactory

Typical Use Cases

Technical Specifications

Frequently Asked Questions

What is Dataplex?

What's the difference between Dataplex and Data Catalog?

What are Lakes, Zones, and Assets in Dataplex?

How does data quality work in Dataplex?

How much does Dataplex cost?

Quick Links

Google Cloud Partner

Similar Products from Other Clouds

Amazon FinSpace - Financial Data Management

Microsoft Fabric - Azure Analytics & Big Data

AWS Data Pipeline - Data Orchestration

Amazon Athena - Serverless SQL Queries

Azure Stream Analytics - Azure Analytics & Big Data

Azure HDInsight - Managed Apache Hadoop, Spark, and Kafka Clusters

Ready to start with Dataplex - Unified Data Governance and Management?