What is Dataform?
Dataform is a fully managed service from Google Cloud for SQL-based data transformation in BigQuery. The service enables data teams to define transformation pipelines as code, with dependency management, automated tests, and Git-based versioning.
Dataform uses SQLX, an extended SQL syntax that supports references between tables, assertions, and JavaScript templating. This makes data warehouse pipelines more maintainable and testable than loose collections of SQL scripts.
Core Features
- SQLX workflows: Extended SQL syntax with dependencies, references, and macros
- Dependency management: Automatic execution order based on table dependencies
- Assertions and tests: Automated data quality checks as part of the pipeline
- Git integration: Versioning and code review for transformation logic
Typical Use Cases
Data Warehouse Modeling
Data teams use Dataform to transform raw data in BigQuery into structured, analyzable models. Staging, intermediate, and mart tables are defined as SQLX files and automatically executed in the correct order.
ELT Pipelines
In ELT architectures, Dataform handles the transformation step after loading raw data into BigQuery. Regular executions ensure that analytics tables remain current.
Benefits
- Natively integrated into BigQuery and Google Cloud
- Managed execution environment without own infrastructure
- Better code quality through tests and code review
- Free (only BigQuery processing costs)
Integration with innFactory
As a Google Cloud Partner, innFactory supports you with Dataform: data warehouse architecture, SQLX pipeline development, and best practices for data transformation.
Typical Use Cases
Frequently Asked Questions
What is Dataform?
Dataform is a managed service from Google Cloud for SQL-based data transformation in BigQuery. It enables defining transformation pipelines as SQLX code with dependency management, tests, and versioning.
What is SQLX?
SQLX is an extension of SQL that Dataform uses. It adds features like references between tables, assertions (tests), JavaScript templating, and reusable macros.
How does Dataform differ from dbt?
Dataform and dbt follow similar concepts. Dataform is natively integrated into Google Cloud and BigQuery with a managed execution environment. dbt offers broader database support.
