Skip to main content
Cloud / Google Cloud / Products / Dataplex - Unified Data Governance and Management

Dataplex - Unified Data Governance and Management

Organize distributed data into logical lakes without data movement. Automatic metadata, data quality, and centralized governance.

Data Analytics
Pricing Model Pay-per-use (CU-hours)
Availability Global with EU regions
Data Sovereignty EU regions available
Reliability 99.9% availability SLA

Dataplex organizes distributed data into logical data lakes without data movement. Centralized governance, automatic metadata, and data quality monitoring.

What is Dataplex?

Dataplex is a data fabric that organizes data across Cloud Storage and BigQuery into logical structures. Instead of copying data into a central data lake, Dataplex creates virtual views over distributed data sources.

The service provides automatic metadata discovery, data quality checks, and centralized governance policies.

Concepts

Lake

Logical container for related data. Typically per business unit or project.

Zone

Grouping by processing stage:

  • Raw Zone: Raw data without transformation
  • Curated Zone: Cleansed, structured data

Asset

The actual data: Cloud Storage buckets or BigQuery datasets. Assets are assigned to zones.

Lake: Customer Analytics
├── Zone: Raw
│   ├── Asset: gs://raw-events (Cloud Storage)
│   └── Asset: gs://raw-transactions
└── Zone: Curated
    ├── Asset: bq://project.curated.events (BigQuery)
    └── Asset: bq://project.curated.customers

Core Features

  • Virtual organization: Data stays where it is
  • Auto discovery: Schema and statistics captured automatically
  • Data quality: Define rules and check automatically
  • Central policies: IAM policies at lake level
  • Data Catalog integration: All metadata searchable

Typical Use Cases

Data Lake Management

Organize hundreds of storage buckets and BigQuery datasets into logical lakes. Teams find data without knowing where it physically resides.

Data Quality Monitoring

Define quality rules (no null values in key fields, valid date formats) and check automatically on schedule. Alerts on violations.

Cross-Team Governance

Centralized policies for data access across multiple teams. Data owners define who can access which zones.

Automatic Documentation

Dataplex automatically captures schemas, statistics, and samples. Teams understand data without manual documentation.

Dataplex vs. Data Catalog

FeatureDataplexData Catalog
Metadata searchYes (via Data Catalog)Yes
Data organizationLakes, Zones, AssetsNo
Data qualityYesNo
PoliciesLake-levelTag-based
DiscoveryAutomaticAutomatic

Benefits

  • No data copying: Virtual organization
  • Automatic: Discovery and profiling without manual effort
  • Unified: Single view of Cloud Storage and BigQuery
  • Governance: Centralized policies and data quality

Integration with innFactory

As a Google Cloud Partner, innFactory supports you with Dataplex: data lake design, data quality strategies, and governance framework implementation.

Typical Use Cases

Data lake management
Data quality monitoring
Centralized governance across teams
Automatic metadata discovery

Technical Specifications

Components Lakes, Zones, Assets
Discovery Automatic schema detection
Quality Auto Data Quality Rules
Sources Cloud Storage, BigQuery

Frequently Asked Questions

What is Dataplex?

Dataplex is a data fabric service that organizes distributed data into logical lakes without moving it. It provides centralized governance, automatic metadata discovery, and data quality checks across Cloud Storage and BigQuery.

What's the difference between Dataplex and Data Catalog?

Data Catalog is for metadata search and tagging. Dataplex goes further by organizing data into lakes/zones, offering data quality checks, and enabling central policies. Dataplex uses Data Catalog for the metadata layer.

What are Lakes, Zones, and Assets in Dataplex?

A Lake is a logical container for related data (e.g., per business unit). Zones group assets by processing stage (Raw, Curated). Assets are the actual data in Cloud Storage buckets or BigQuery datasets.

How does data quality work in Dataplex?

Dataplex Data Quality defines rules (null values, formats, ranges, uniqueness) and checks them automatically on schedule. Results appear in Data Catalog and can trigger alerts. Auto Data Quality suggests rules based on profiling.

How much does Dataplex cost?

Dataplex charges based on Compute Units (CU) for discovery, quality scans, and processing. Discovery is relatively inexpensive; quality scans on large datasets can be more expensive. The first 30 days per lake are free.

Google Cloud Partner

innFactory is a certified Google Cloud Partner. We provide expert consulting, implementation, and managed services.

Google Cloud Partner

Similar Products from Other Clouds

Other cloud providers offer comparable services in this category. As a multi-cloud partner, we help you choose the right solution.

27 comparable products found across other clouds.

Ready to start with Dataplex - Unified Data Governance and Management?

Our certified Google Cloud experts help you with architecture, integration, and optimization.

Schedule Consultation