What's the difference between Copy Activity and Mapping Data Flows?

Copy Activity is optimized for simple data copying between sources (fast, cost-effective). Mapping Data Flows is a Spark-based framework for complex transformations (joins, aggregations, pivots) with visual development.

How does the Self-hosted Integration Runtime work?

The Self-hosted IR is software you install in your data center or VNET. It enables secure access to on-premises data sources without public IPs and supports hybrid scenarios.

Can I use existing SSIS packages?

Yes, through the Azure-SSIS Integration Runtime you can run existing SQL Server Integration Services (SSIS) packages without changes in Azure. This facilitates lift-and-shift migrations.

How do I monitor pipeline failures?

Data Factory provides integrated monitoring with Pipeline Runs, Activity Logs, and Azure Monitor integration. You can set up alerts for failed runs and analyze logs in Log Analytics.

Which data sources are supported?

Over 100 connectors: Cloud databases (Azure SQL, Cosmos DB), on-premises (SQL Server, Oracle, SAP), SaaS (Salesforce, Dynamics 365), file stores (Azure Blob, AWS S3, SFTP), data warehouses (Snowflake, BigQuery).

How secure is data transfer?

Data Factory supports encryption in transit (TLS 1.2+), Managed Identities for authentication, Private Endpoints for VNET integration, and Azure Key Vault for credentials management.

Can I use Git integration?

Yes, Data Factory supports Azure DevOps Git and GitHub integration for version control, CI/CD pipelines, and team collaboration.

Azure Data Factory - Cloud ETL and Data Integration · innFactory

Azure Data Factory is a fully managed cloud ETL service from Microsoft that enables integration and transformation of data from over 100 sources. The service combines code-free visual development with powerful transformation engines.

What is Azure Data Factory?

Azure Data Factory (ADF) is Microsoft’s solution for modern data integration and ETL workflows. The service enables extracting, transforming, and loading data from heterogeneous sources into target systems without managing your own infrastructure.

The strength of Data Factory lies in the combination of three components:

1. Copy Activity: High-performance data copying between 100+ connectors. Optimized for high throughput and low cost.

2. Mapping Data Flows: Visual, Spark-based transformation engine for complex data processing without code. Similar to SSIS but cloud-native.

3. Pipeline Orchestration: Workflow engine with triggers, dependencies, conditional logic, and error handling.

Azure Data Factory is particularly suitable for companies migrating from traditional on-premises ETL tools (SSIS, Informatica, Talend) to cloud-native solutions. The Integration Runtime supports hybrid scenarios where data sources reside both in the cloud and on-premises.

For GDPR-compliant data processing, Data Factory is available in European regions (Germany West Central, West Europe, France Central). The service meets ISO 27001, SOC 2, and other compliance standards.

Integration Runtime Options

The Integration Runtime (IR) is the compute infrastructure Data Factory uses for data integration:

Runtime	Usage	Location
Azure IR	Cloud-to-cloud data movement	Managed by Azure
Self-hosted IR	On-premises, private networks	Customer-installed
Azure-SSIS IR	Run SSIS packages	Managed by Azure

Azure IR is sufficient for most cloud scenarios and requires no configuration.

Self-hosted IR is required for:

On-premises data sources behind firewalls
Private Azure VNETs without public endpoints
Data sources that don’t allow public IP addresses

The Self-hosted IR is installed as software on Windows servers and registers with Data Factory. Data flows directly between source and destination; Data Factory only orchestrates.

Azure-SSIS IR enables running existing SSIS packages without changes. Ideal for lift-and-shift migrations from SQL Server to Azure.

Mapping Data Flows vs. Copy Activity

For choosing the right component:

Use Copy Activity when:

Data only needs to be copied (without transformation)
Simple column mappings are sufficient
Costs should be minimized
High throughput is priority (up to 256 MBps per DIU)

Use Mapping Data Flows when:

Complex transformations are required (joins, aggregations, pivots)
Visual development is preferred
Data quality checks should be implemented
Spark scaling for large data volumes is needed

Mapping Data Flows uses Spark under the hood, scales automatically, and provides an SSIS-like visual interface. Costs are higher than Copy Activity but often cheaper than manual Spark implementations.

Typical Use Cases

1. Data Warehouse ETL

Load data from operational systems into Azure Synapse Analytics or SQL Database for reporting.

Example: Nightly aggregation of sales data from 20+ stores into a central data warehouse.

2. Cloud Migration

Migrate databases and file stores from on-premises to Azure.

Example: One-time migration of a 5 TB Oracle database to Azure SQL with Self-hosted IR.

3. Hybrid Data Integration

Combine cloud and on-premises data for analytics.

Example: Join Azure Cosmos DB (cloud) with SAP HANA (on-prem) for supply chain analytics.

4. Real-time Data Pipelines

Orchestrate event-based workflows with Tumbling Window triggers.

Example: Processing IoT data from Event Hub every 5 minutes with aggregation in Synapse.

5. Multi-Cloud Integration

Integrate data across cloud providers.

Example: Copy data from AWS S3 to Azure Blob Storage, then transform with Data Flows.

6. SaaS Integration

Extract data from SaaS applications for internal analytics.

Example: Sync Salesforce CRM data daily to Azure SQL for custom dashboards.

Best Practices

1. Optimize Copy Activity Performance

Use parallel copies for large files
Set Data Integration Units (DIUs) to 2-256 depending on data volume
Choose compression for network transfers
Use partition options for large tables

{
  "source": {
    "type": "AzureSqlSource",
    "partitionOption": "PhysicalPartitionsOfTable"
  },
  "sink": {
    "type": "AzureSqlSink",
    "writeBatchSize": 10000
  },
  "parallelCopies": 4,
  "dataIntegrationUnits": 32
}

2. Use Incremental Loads

Avoid full refreshes through watermark-based incremental loads:

SELECT * FROM source_table
WHERE last_modified > '@{pipeline().parameters.watermark}'

3. Implement Error Handling

Use retry policies, failure handling, and alerts:

{
  "retryPolicy": {
    "count": 3,
    "intervalInSeconds": 30
  },
  "onFailure": {
    "dependentActivities": ["SendErrorEmail"]
  }
}

4. Security Best Practices

Use Managed Identities instead of connection strings
Store credentials in Azure Key Vault
Enable Private Endpoints for sensitive data
Use data encryption in transit and at rest

5. Monitoring and Logging

Enable Azure Monitor integration
Use Log Analytics for pipeline analyses
Set alerts for failed runs
Track data lineage with Purview

6. Cost Optimization

Use Schedule Triggers instead of Tumbling Window where possible
Disable Debug Mode in production
Optimize DIUs based on performance tests
Use Self-hosted IR to avoid egress costs

Frequently Asked Questions

What does Azure Data Factory cost?

Costs consist of three components:

Orchestration: Approx. 1 EUR per 1,000 pipeline runs
Copy Activity: 0.25 EUR per DIU-hour
Data Flows: 0.27 EUR per vCore-hour (Compute-Optimized)

Example: 100 daily pipeline runs with 1 TB data volume cost approx. 150-300 EUR/month. Use the Azure Pricing Calculator for accurate estimates.

Is Azure Data Factory GDPR compliant?

Yes, when choosing European Azure regions. Data Factory meets ISO 27001, SOC 2, GDPR, and other standards. Data never leaves the chosen region (except with explicit cross-region configuration).

How does Azure Data Factory integrate with other Azure services?

Native integration with: Azure Synapse Analytics, SQL Database, Cosmos DB, Blob Storage, Data Lake Storage, Databricks, Event Hub, Stream Analytics, Purview, Key Vault, and 90+ other services.

What SLAs does Azure Data Factory offer?

99.9% SLA for Data Factory V2. Applies to pipeline orchestration, not external data sources. Details in Azure Service Level Agreement.

Can I use Azure Data Factory in hybrid cloud scenarios?

Yes, via Self-hosted Integration Runtime. Install the IR software on-premises and Data Factory orchestrates data flows securely over encrypted connections.

Integration with innFactory

As a Microsoft Solutions Partner, innFactory supports you with:

Data Factory Implementation: From architecture to production deployment
Migration from SSIS/Informatica: Modernization of existing ETL processes
Performance Optimization: Cost reduction and acceleration
Hybrid Integration: Secure connection of on-premises sources
CI/CD Setup: Azure DevOps integration and Git workflows

Azure Data Factory - Cloud ETL and Data Integration

What is Azure Data Factory?

Integration Runtime Options

Mapping Data Flows vs. Copy Activity

Typical Use Cases

1. Data Warehouse ETL

2. Cloud Migration

3. Hybrid Data Integration

4. Real-time Data Pipelines

5. Multi-Cloud Integration

6. SaaS Integration

Best Practices

1. Optimize Copy Activity Performance

2. Use Incremental Loads

3. Implement Error Handling

4. Security Best Practices

5. Monitoring and Logging

6. Cost Optimization

Frequently Asked Questions

What does Azure Data Factory cost?

Is Azure Data Factory GDPR compliant?

How does Azure Data Factory integrate with other Azure services?

What SLAs does Azure Data Factory offer?

Can I use Azure Data Factory in hybrid cloud scenarios?

Integration with innFactory

Available Tiers & Options

Data Factory V2

Data Flow Compute-Optimized

Data Flow Memory-Optimized

Typical Use Cases

Technical Specifications

Frequently Asked Questions

What's the difference between Copy Activity and Mapping Data Flows?

How does the Self-hosted Integration Runtime work?

Can I use existing SSIS packages?

How do I monitor pipeline failures?

Which data sources are supported?

What does Azure Data Factory cost?

How secure is data transfer?

Can I use Git integration?

Quick Links

Microsoft Solutions Partner

Comparable Products from Other Clouds

AWS Glue - Serverless ETL

Dataflow - Managed Stream and Batch Processing

Ready to start with Azure Data Factory - Cloud ETL and Data Integration?