Dataprep by Trifacta is an intelligent service for visual data preparation. Automatic analysis, transformation suggestions, and serverless execution on Dataflow.
What is Dataprep?
Dataprep automatically analyzes your data, detects patterns and anomalies, and suggests transformations. Analysts work visually in the browser; execution happens on Dataflow.
The service is ideal for exploratory data analysis and ad-hoc cleaning before data is used in BigQuery for analytics.
How It Works
1. Import Data
Cloud Storage / BigQuery / Upload
│
▼
2. Automatic Profiling
┌──────────────────────┐
│ Data types detected │
│ Anomalies marked │
│ Statistics calculated│
└──────────────────────┘
│
▼
3. Define Transformations
- Visually in browser
- Intelligent suggestions
- Preview before execution
│
▼
4. Execute Job
Dataflow (serverless)
│
▼
5. Export
BigQuery / Cloud StorageCore Features
- Automatic profiling: Data types, distributions, anomalies
- Intelligent suggestions: AI suggests transformations
- Visual transformations: Click instead of code
- Reusable recipes: Save transformations as templates
- Serverless execution: Dataflow scales automatically
Typical Use Cases
Data Cleaning for Analytics
Clean CSV exports, Excel files, or unstructured data. Handle null values, standardize formats, remove duplicates.
Exploratory Data Analysis
Understand new datasets. Dataprep automatically shows distributions, outliers, and potential quality issues.
Self-Service Data Prep
Business analysts prepare data themselves without waiting for data engineering teams.
Ad-hoc Transformations
One-time data conversions that don’t justify a production ETL job.
Dataprep vs. Data Fusion vs. Dataflow
| Criterion | Dataprep | Data Fusion | Dataflow |
|---|---|---|---|
| User | Analysts | ETL developers | Developers |
| Interface | Visual | Visual | Code (Beam) |
| Use Case | Ad-hoc prep | Production ETL | Complex pipelines |
| Scheduling | Limited | Yes | Yes |
| Complexity | Low | Medium | High |
Benefits
- Simple: Visual interface without code
- Intelligent: AI-powered transformation suggestions
- Serverless: No infrastructure to manage
- Fast: From raw data to analytics in minutes
Limitations
- Not for production: Limited scheduling and monitoring
- Trifacta dependency: UI hosted by third party
- Costs: Can be expensive for large datasets
Integration with innFactory
As a Google Cloud Partner, innFactory supports you with Dataprep: evaluation for your use cases, integration into analytics workflows, and comparison with alternatives.
Typical Use Cases
Technical Specifications
Frequently Asked Questions
What is Dataprep?
Dataprep by Trifacta is an intelligent service for visual data preparation. It automatically analyzes data, suggests transformations, and executes them on Dataflow. Ideal for analysts without programming skills.
How does Dataprep differ from Data Fusion?
Dataprep is for ad-hoc data cleaning and exploratory analysis by analysts. Data Fusion is for production ETL pipelines with scheduling and orchestration. Dataprep is simpler, Data Fusion more powerful.
Where do Dataprep jobs run?
Dataprep jobs run on Dataflow. Data stays in Google Cloud (Cloud Storage, BigQuery). Trifacta hosts only the UI, not the data.
How much does Dataprep cost?
Dataprep charges Units based on processed data. Additionally, Dataflow costs apply for execution. For small datasets (exploration), costs are low; large production jobs can be expensive.
Is Dataprep GDPR-compliant?
Data processing runs on Dataflow in your GCP environment. The Trifacta UI processes only metadata and samples. For strict compliance requirements, review Trifacta documentation.
