What is Cloud Data Fusion?
Cloud Data Fusion is Google’s fully managed data integration platform. The service is based on the open-source project CDAP and enables visual ETL development via drag-and-drop. Data analysts create pipelines without programming.
Core Features
- Visual pipeline designer: Drag-and-drop interface for ETL workflows
- 150+ connectors: Pre-built connections to databases, SaaS, and cloud services
- Data lineage: Automatic tracking of data flows
- Pipeline templates: Reusable templates for common integration patterns
- Dataproc integration: Automatic cluster provisioning for pipeline execution
Common Use Cases
Data Warehouse Loading
Load data from operational systems, SaaS applications, and files into BigQuery. Transformations happen visually without SQL knowledge.
Hybrid Integration
Cloud Data Fusion connects on-premises databases with cloud data lakes. The Private Connector enables secure connections via VPN or Interconnect.
Data Migration
During cloud migrations, Data Fusion handles initial data export and ongoing synchronization until cutover.
Benefits
- No programming required for simple pipelines
- Visual debugging and monitoring
- Serverless: no infrastructure to manage
- Enterprise security with CMEK and VPC-SC
Integration with innFactory
As a Google Cloud Partner, innFactory supports you with Cloud Data Fusion: pipeline design, custom plugin development, migration from existing ETL tools, and performance optimization.
Available Tiers & Options
Basic
- Lower cost
- Simple pipelines
- Limited features
Enterprise
- Full feature set
- Advanced security
- Customer-managed encryption
- Higher cost
Typical Use Cases
Technical Specifications
Frequently Asked Questions
What is Cloud Data Fusion?
Cloud Data Fusion is a fully managed ETL platform based on CDAP. It enables visual data integration without programming.
Which data sources does Cloud Data Fusion support?
Cloud Data Fusion offers over 150 pre-built connectors for databases, SaaS applications, cloud storage, and on-premises systems.
What's the difference between Basic and Enterprise?
Enterprise offers advanced security, Customer-Managed Encryption Keys, VPC-SC support, and streaming pipelines. Basic is for simple batch pipelines.
How does Cloud Data Fusion scale?
Cloud Data Fusion runs on Dataproc and scales automatically. Pipeline executions use ephemeral clusters that shut down after completion.
Can I develop custom plugins?
Yes, Cloud Data Fusion supports custom plugins in Java. The CDAP Plugin Hub also offers community plugins.
