Skip to main content
Cloud / Azure / Products / Azure Data Factory - Cloud ETL and Data Integration

Azure Data Factory - Cloud ETL and Data Integration

Azure Data Factory: Fully managed ETL service for cloud-based data integration and orchestration

analytics
Pricing Model Pay-per-pipeline-run and data volume
Availability 30+ Azure regions worldwide
Data Sovereignty EU regions available (Germany, Netherlands, France)
Reliability 99.9% for Data Factory V2 SLA

Azure Data Factory is a fully managed cloud ETL service from Microsoft that enables integration and transformation of data from over 100 sources. The service combines code-free visual development with powerful transformation engines.

What is Azure Data Factory?

Azure Data Factory (ADF) is Microsoft’s solution for modern data integration and ETL workflows. The service enables extracting, transforming, and loading data from heterogeneous sources into target systems without managing your own infrastructure.

The strength of Data Factory lies in the combination of three components:

1. Copy Activity: High-performance data copying between 100+ connectors. Optimized for high throughput and low cost.

2. Mapping Data Flows: Visual, Spark-based transformation engine for complex data processing without code. Similar to SSIS but cloud-native.

3. Pipeline Orchestration: Workflow engine with triggers, dependencies, conditional logic, and error handling.

Azure Data Factory is particularly suitable for companies migrating from traditional on-premises ETL tools (SSIS, Informatica, Talend) to cloud-native solutions. The Integration Runtime supports hybrid scenarios where data sources reside both in the cloud and on-premises.

For GDPR-compliant data processing, Data Factory is available in European regions (Germany West Central, West Europe, France Central). The service meets ISO 27001, SOC 2, and other compliance standards.

Integration Runtime Options

The Integration Runtime (IR) is the compute infrastructure Data Factory uses for data integration:

RuntimeUsageLocation
Azure IRCloud-to-cloud data movementManaged by Azure
Self-hosted IROn-premises, private networksCustomer-installed
Azure-SSIS IRRun SSIS packagesManaged by Azure

Azure IR is sufficient for most cloud scenarios and requires no configuration.

Self-hosted IR is required for:

  • On-premises data sources behind firewalls
  • Private Azure VNETs without public endpoints
  • Data sources that don’t allow public IP addresses

The Self-hosted IR is installed as software on Windows servers and registers with Data Factory. Data flows directly between source and destination; Data Factory only orchestrates.

Azure-SSIS IR enables running existing SSIS packages without changes. Ideal for lift-and-shift migrations from SQL Server to Azure.

Mapping Data Flows vs. Copy Activity

For choosing the right component:

Use Copy Activity when:

  • Data only needs to be copied (without transformation)
  • Simple column mappings are sufficient
  • Costs should be minimized
  • High throughput is priority (up to 256 MBps per DIU)

Use Mapping Data Flows when:

  • Complex transformations are required (joins, aggregations, pivots)
  • Visual development is preferred
  • Data quality checks should be implemented
  • Spark scaling for large data volumes is needed

Mapping Data Flows uses Spark under the hood, scales automatically, and provides an SSIS-like visual interface. Costs are higher than Copy Activity but often cheaper than manual Spark implementations.

Typical Use Cases

1. Data Warehouse ETL

Load data from operational systems into Azure Synapse Analytics or SQL Database for reporting.

Example: Nightly aggregation of sales data from 20+ stores into a central data warehouse.

2. Cloud Migration

Migrate databases and file stores from on-premises to Azure.

Example: One-time migration of a 5 TB Oracle database to Azure SQL with Self-hosted IR.

3. Hybrid Data Integration

Combine cloud and on-premises data for analytics.

Example: Join Azure Cosmos DB (cloud) with SAP HANA (on-prem) for supply chain analytics.

4. Real-time Data Pipelines

Orchestrate event-based workflows with Tumbling Window triggers.

Example: Processing IoT data from Event Hub every 5 minutes with aggregation in Synapse.

5. Multi-Cloud Integration

Integrate data across cloud providers.

Example: Copy data from AWS S3 to Azure Blob Storage, then transform with Data Flows.

6. SaaS Integration

Extract data from SaaS applications for internal analytics.

Example: Sync Salesforce CRM data daily to Azure SQL for custom dashboards.

Best Practices

1. Optimize Copy Activity Performance

  • Use parallel copies for large files
  • Set Data Integration Units (DIUs) to 2-256 depending on data volume
  • Choose compression for network transfers
  • Use partition options for large tables
{
  "source": {
    "type": "AzureSqlSource",
    "partitionOption": "PhysicalPartitionsOfTable"
  },
  "sink": {
    "type": "AzureSqlSink",
    "writeBatchSize": 10000
  },
  "parallelCopies": 4,
  "dataIntegrationUnits": 32
}

2. Use Incremental Loads

Avoid full refreshes through watermark-based incremental loads:

SELECT * FROM source_table
WHERE last_modified > '@{pipeline().parameters.watermark}'

3. Implement Error Handling

Use retry policies, failure handling, and alerts:

{
  "retryPolicy": {
    "count": 3,
    "intervalInSeconds": 30
  },
  "onFailure": {
    "dependentActivities": ["SendErrorEmail"]
  }
}

4. Security Best Practices

  • Use Managed Identities instead of connection strings
  • Store credentials in Azure Key Vault
  • Enable Private Endpoints for sensitive data
  • Use data encryption in transit and at rest

5. Monitoring and Logging

  • Enable Azure Monitor integration
  • Use Log Analytics for pipeline analyses
  • Set alerts for failed runs
  • Track data lineage with Purview

6. Cost Optimization

  • Use Schedule Triggers instead of Tumbling Window where possible
  • Disable Debug Mode in production
  • Optimize DIUs based on performance tests
  • Use Self-hosted IR to avoid egress costs

Frequently Asked Questions

What does Azure Data Factory cost?

Costs consist of three components:

  • Orchestration: Approx. 1 EUR per 1,000 pipeline runs
  • Copy Activity: 0.25 EUR per DIU-hour
  • Data Flows: 0.27 EUR per vCore-hour (Compute-Optimized)

Example: 100 daily pipeline runs with 1 TB data volume cost approx. 150-300 EUR/month. Use the Azure Pricing Calculator for accurate estimates.

Is Azure Data Factory GDPR compliant?

Yes, when choosing European Azure regions. Data Factory meets ISO 27001, SOC 2, GDPR, and other standards. Data never leaves the chosen region (except with explicit cross-region configuration).

How does Azure Data Factory integrate with other Azure services?

Native integration with: Azure Synapse Analytics, SQL Database, Cosmos DB, Blob Storage, Data Lake Storage, Databricks, Event Hub, Stream Analytics, Purview, Key Vault, and 90+ other services.

What SLAs does Azure Data Factory offer?

99.9% SLA for Data Factory V2. Applies to pipeline orchestration, not external data sources. Details in Azure Service Level Agreement.

Can I use Azure Data Factory in hybrid cloud scenarios?

Yes, via Self-hosted Integration Runtime. Install the IR software on-premises and Data Factory orchestrates data flows securely over encrypted connections.

Integration with innFactory

As a Microsoft Solutions Partner, innFactory supports you with:

  • Data Factory Implementation: From architecture to production deployment
  • Migration from SSIS/Informatica: Modernization of existing ETL processes
  • Performance Optimization: Cost reduction and acceleration
  • Hybrid Integration: Secure connection of on-premises sources
  • CI/CD Setup: Azure DevOps integration and Git workflows

Contact us for a non-binding consultation on Azure Data Factory and data integration.

Available Tiers & Options

Data Flow Compute-Optimized

Strengths
  • Optimized for transforming large data volumes
  • Spark-based execution
  • Auto-scaling clusters
Considerations
  • Higher costs than Copy Activity

Data Flow Memory-Optimized

Strengths
  • Ideal for complex joins and aggregations
  • More RAM per core
Considerations
  • More expensive than Compute-Optimized

Typical Use Cases

ETL pipelines for data warehouses
Migration from on-premises to cloud
Integration of heterogeneous data sources
Orchestrate real-time data streams
Data transformation without code
Hybrid data integration (cloud + on-prem)

Technical Specifications

Connectors 100+ (SQL Server, Oracle, SAP, Salesforce, AWS S3, Google BigQuery, etc.)
Data flows Mapping Data Flows (visual), Wrangling Data Flows (Power Query)
Formats CSV, JSON, Parquet, Avro, ORC, XML, Binary
Integration runtime Azure IR, Self-hosted IR, Azure-SSIS IR
Monitoring Azure Monitor Integration, Pipeline Runs, Activity Logs
Orchestration Pipeline Activities, Triggers (Schedule, Tumbling Window, Event), Dependencies
Security Managed Identity, Key Vault Integration, Private Endpoints

Frequently Asked Questions

What's the difference between Copy Activity and Mapping Data Flows?

Copy Activity is optimized for simple data copying between sources (fast, cost-effective). Mapping Data Flows is a Spark-based framework for complex transformations (joins, aggregations, pivots) with visual development.

How does the Self-hosted Integration Runtime work?

The Self-hosted IR is software you install in your data center or VNET. It enables secure access to on-premises data sources without public IPs and supports hybrid scenarios.

Can I use existing SSIS packages?

Yes, through the Azure-SSIS Integration Runtime you can run existing SQL Server Integration Services (SSIS) packages without changes in Azure. This facilitates lift-and-shift migrations.

How do I monitor pipeline failures?

Data Factory provides integrated monitoring with Pipeline Runs, Activity Logs, and Azure Monitor integration. You can set up alerts for failed runs and analyze logs in Log Analytics.

Which data sources are supported?

Over 100 connectors: Cloud databases (Azure SQL, Cosmos DB), on-premises (SQL Server, Oracle, SAP), SaaS (Salesforce, Dynamics 365), file stores (Azure Blob, AWS S3, SFTP), data warehouses (Snowflake, BigQuery).

What does Azure Data Factory cost?

Pricing is based on pipeline orchestration (approx. 1 EUR per 1,000 runs), Copy Activity (0.25 EUR per DIU-hour), Data Flow executions (0.27 EUR per vCore-hour). Exact costs depend on volume and complexity.

How secure is data transfer?

Data Factory supports encryption in transit (TLS 1.2+), Managed Identities for authentication, Private Endpoints for VNET integration, and Azure Key Vault for credentials management.

Can I use Git integration?

Yes, Data Factory supports Azure DevOps Git and GitHub integration for version control, CI/CD pipelines, and team collaboration.

Microsoft Solutions Partner

innFactory is a Microsoft Solutions Partner. We provide expert consulting, implementation, and managed services for Azure.

Microsoft Solutions Partner Microsoft Data & AI

Ready to start with Azure Data Factory - Cloud ETL and Data Integration?

Our certified Azure experts help you with architecture, integration, and optimization.

Schedule Consultation