Google Cloud Datastream enables serverless real-time data replication from databases to analytical systems.
What is Datastream?
Datastream is a serverless Change Data Capture (CDC) and replication service. It captures database changes in real-time and streams them to BigQuery, Cloud Storage, or other targets. The service requires no infrastructure management and scales automatically with data volume.
Core Features
- Serverless CDC: Automatic capture of database changes without infrastructure
- Real-time streaming: Changes transferred to target in seconds
- Schema detection: Automatic detection and propagation of schema changes
- Private connectivity: Secure connection to on-premises databases via VPN/Interconnect
- Unified streams: One stream can replicate multiple tables and schemas
Typical Use Cases
Real-time Analytics with BigQuery
Continuous replication of transaction data from Oracle or MySQL to BigQuery for real-time dashboards and analytics. The source database isn’t burdened with analytics queries.
Database Migration with Zero Downtime
Prepare for database migration through continuous data synchronization. Cutover happens with minimal downtime since the target database is already in sync.
Event-Driven Architectures
Trigger Cloud Functions or Dataflow pipelines based on database changes. Datastream delivers the changes; downstream services process them.
Benefits
- No infrastructure: Fully serverless without clusters or servers to manage
- Minimal source load: CDC via transaction logs barely impacts source database
- Easy setup: Streams configured via Console or Terraform in minutes
- Reliable: Automatic retry and exactly-once semantics
Integration with innFactory
As a Google Cloud Partner, innFactory supports you with Datastream: building real-time analytics pipelines, migrating on-premises databases, integrating with existing ETL processes, and architecting event-driven systems.
Available Tiers & Options
Datastream
- Fully serverless
- Real-time CDC without coding
- Private connectivity
- Limited source databases
Typical Use Cases
Technical Specifications
Frequently Asked Questions
Which source databases does Datastream support?
Datastream supports Oracle, MySQL, PostgreSQL, and AlloyDB as sources. Supported targets are BigQuery, Cloud Storage, and Cloud SQL.
What's the difference between Datastream and Database Migration Service?
Datastream is designed for continuous real-time replication, while Database Migration Service is optimized for one-time migrations. Datastream is suitable for analytics pipelines; DMS for database migrations.
How does Change Data Capture work with Datastream?
Datastream reads changes from the source database's transaction log (e.g., Oracle Redo Logs, MySQL Binlog) and streams them in real-time to the target. The source database is minimally impacted.
Can Datastream connect to on-premises databases?
Yes, via private connectivity with Cloud VPN or Interconnect, on-premises Oracle, MySQL, or PostgreSQL databases can be configured as sources.
How is Datastream billed?
Billing is based on processed data volume (CDC data and backfill data charged separately). There are no minimum fees and no costs during inactivity.
