πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Azure Data Factory: Pipelines, Datasets & Triggers

Azure Data EngineeringAzure Data Factory⭐ Premium

Advertisement

Azure Data Factory: Pipelines, Datasets & Triggers

Enterprise ETL/ELT orchestration with Azure Data Factory pipelines, activities, and monitoring

ADF Architecture Overview

Architecture Diagram
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    AZURE DATA FACTORY ARCHITECTURE                   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚                    ADF FACTORY                                β”‚   β”‚
β”‚  β”‚                                                               β”‚   β”‚
β”‚  β”‚  LINKED SERVICES          DATASETS             PIPELINES     β”‚   β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚   β”‚
β”‚  β”‚  β”‚ ADLS Gen2    │──────>β”‚ Parquet DS   │────>β”‚ Pipeline β”‚  β”‚   β”‚
β”‚  β”‚  β”‚ SQL Server   │──────>β”‚ CSV DS       β”‚     β”‚          β”‚  β”‚   β”‚
β”‚  β”‚  β”‚ Cosmos DB    │──────>β”‚ JSON DS      β”‚     β”‚ Activitiesβ”‚  β”‚   β”‚
β”‚  β”‚  β”‚ Event Hubs   │──────>β”‚ Avro DS      β”‚     β”‚          β”‚  β”‚   β”‚
β”‚  β”‚  β”‚ REST API     │──────>β”‚ Binary DS    β”‚     β”‚ Triggers β”‚  β”‚   β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚   β”‚
β”‚  β”‚                                                               β”‚   β”‚
β”‚  β”‚  INTEGRATION RUNTIMES      MONITORING         GIT INTEGRATIONβ”‚   β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚   β”‚
β”‚  β”‚  β”‚ Auto Resolve IR  β”‚    β”‚ Pipeline Runsβ”‚   β”‚ Azure    β”‚  β”‚   β”‚
β”‚  β”‚  β”‚ Self-Hosted IR   β”‚    β”‚ Activity Runsβ”‚   β”‚ DevOps   β”‚  β”‚   β”‚
β”‚  β”‚  β”‚ Managed VNet IR  β”‚    β”‚ Trigger Runs β”‚   β”‚ GitHub   β”‚  β”‚   β”‚
β”‚  β”‚  β”‚ Spark IR         β”‚    β”‚ Alerts       β”‚   β”‚          β”‚  β”‚   β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                     β”‚
β”‚  DATA FLOW (Visual ETL):                                            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  Source ──> Filter ──> Derive ──> Join ──> Aggregate ──> Sink β”‚   β”‚
β”‚  β”‚  (ADLS)    (Row     (Add      (Lookup) (Group By)   (ADLS) β”‚   β”‚
β”‚  β”‚            Filter)  Columns)                                β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Pipeline JSON Example

{
  "name": "pl_daily_sales_etl",
  "properties": {
    "activities": [
      {
        "name": "CopySalesData",
        "type": "Copy",
        "typeProperties": {
          "source": {
            "type": "AzureBlobStorageSource",
            "storeSettings": {
              "type": "AzureBlobFSReadSettings",
              "recursive": true
            },
            "formatSettings": {
              "type": "JsonReadSettings"
            }
          },
          "sink": {
            "type": "AzureDataLakeStorageGen2Sink",
            "storeSettings": {
              "type": "AzureDataLakeGen2WriteSettings",
              "copyBehavior": "PreserveHierarchy"
            },
            "formatSettings": {
              "type": "ParquetWriteSettings"
            }
          }
        },
        "inputs": [
          {
            "name": "ds_raw_sales"
          }
        ],
        "outputs": [
          {
            "name": "ds_staging_sales"
          }
        ]
      },
      {
        "name": "TransformAndLoad",
        "type": "DatabricksNotebook",
        "typeProperties": {
          "notebookPath": "/Repos/data_engineering/sales_transformation"
        },
        "dependsOn": [
          {
            "activity": "CopySalesData",
            "dependencyConditions": ["Succeeded"]
          }
        ],
        "policy": {
          "timeout": "0.1:0:0",
          "retry": 1,
          "retryIntervalInSeconds": 30
        }
      },
      {
        "name": "LoadToSynapse",
        "type": "SqlPoolStoredProcedure",
        "typeProperties": {
          "storedProcedureName": "sp_load_fact_sales"
        },
        "dependsOn": [
          {
            "activity": "TransformAndLoad",
            "dependencyConditions": ["Succeeded"]
          }
        ]
      }
    ],
    "parameters": {
      "date": {
        "type": "String",
        "defaultValue": "@formatDateTime(utcNow(), 'yyyy-MM-dd')"
      },
      "source": {
        "type": "String",
        "defaultValue": "sales_api"
      }
    },
    "variables": {
      "retryCount": {
        "type": "Int32",
        "defaultValue": 0
      }
    }
  }
}

Linked Service Configuration

{
  "name": "ls_adls_gen2",
  "properties": {
    "type": "AzureBlobFS",
    "typeProperties": {
      "url": "https://stdatalake001.dfs.core.windows.net",
      "accountKey": {
        "type": "AzureKeyVaultSecret",
        "store": {
          "referenceName": "akv_dataengineering",
          "type": "LinkedServiceReference"
        },
        "secretName": "adls-storage-key"
      }
    },
    "connectVia": {
      "referenceName": "AutoResolveIntegrationRuntime",
      "type": "IntegrationRuntimeReference"
    }
  }
}

Trigger Types

Architecture Diagram
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    ADF TRIGGER TYPES                             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                 β”‚
β”‚  SCHEDULE TRIGGER                                               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ Cron: 0 0 2 * * * (Daily at 2 AM)                       β”‚   β”‚
β”‚  β”‚ Recurrence: Every 1 hour                                β”‚   β”‚
β”‚  β”‚ Time Zone: UTC / Local                                  β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                 β”‚
β”‚  TUMBLING WINDOW TRIGGER                                        β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ Window Size: 1 Day                                      β”‚   β”‚
β”‚  β”‚ Frequency: Day                                          β”‚   β”‚
β”‚  β”‚ Anchor: 2024-01-01                                      β”‚   β”‚
β”‚  β”‚ Parallel: 3 (process 3 windows concurrently)            β”‚   β”‚
β”‚  β”‚ MaxConcurrency: 10                                      β”‚   β”‚
β”‚  β”‚ Retry: 3 attempts, 5 min interval                       β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                 β”‚
β”‚  EVENT TRIGGER                                                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ Event: Blob Created                                     β”‚   β”‚
β”‚  β”‚ Container: raw                                          β”‚   β”‚
β”‚  β”‚ Blob Path Begins With: sales/                           β”‚   β”‚
β”‚  β”‚ Event Type: Microsoft.Storage.BlobCreated               β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                 β”‚
β”‚  STORAGE EVENT TRIGGER                                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ Event: Blob Deleted                                     β”‚   β”‚
β”‚  β”‚ Subject Begins With: /blob/services/blob/containers/     β”‚   β”‚
β”‚  β”‚ Ignore Blob Types: Append Blob                          β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Event Trigger JSON

{
  "name": "tr_blob_arrival",
  "properties": {
    "type": "BlobEventsTrigger",
    "typeProperties": {
      "blobPathBeginsWith": "raw/sales/",
      "blobPathEndsWith": ".json",
      "ignoreEmptyBlobs": true
    },
    "pipelines": [
      {
        "pipelineReference": {
          "referenceName": "pl_daily_sales_etl",
          "type": "PipelineReference"
        },
        "parameters": {
          "date": "@triggerBody().fileName",
          "source": "blob_trigger"
        }
      }
    ]
  }
}

⚠️

Important: Event Triggers require an Event Grid-enabled storage account and a Managed Virtual Network with Event Grid private endpoints for production scenarios.

Self-Hosted Integration Runtime

{
  "name": "ir-selfhosted-onprem",
  "properties": {
    "type": "SelfHosted",
    "typeProperties": {
      "linkedInfo": {
        "type": "LinkedIntegrationRuntimeKey",
        "key": "<EncryptedKey>"
      }
    },
    "hostCaching": "Enabled",
    "nodeCommunicationChannel": "ServiceEndpoint"
  }
}

IR Node Configuration

# Install Self-Hosted IR on Windows
.\DataMovementLibraryRuntimeSetup.exe /quiet /InstallPath:"C:\DI\IR"

# Register IR node
.\RegisterLauncher.exe register -endpoint "https://adf-prod.azure.com" -authKey "<key>" -nodeName "IR-Node-01"

# Check IR status
.\StatusReporter.exe -endpoint "https://adf-prod.azure.com" -authKey "<key>"

Interview Questions

Q1: Explain the difference between Copy Activity and Data Flow in ADF. A: Copy Activity moves data as-is (or with minimal transformation) using optimized engines. Data Flow provides visual ETL with transformations (filter, derive, join, aggregate). Copy is faster for simple moves; Data Flow for complex transformations.

Q2: How do you handle schema changes in ADF pipelines? A: Use mapping data flows with schema drift enabled, or use ADF parameters to dynamically handle column changes. For Copy Activity, use "schema mapping" or "auto mapping" with schema validation.

Q3: What are the best practices for ADF pipeline monitoring? A: 1) Set up alerts for failed runs, 2) Use diagnostic settings to send logs to Log Analytics, 3) Implement custom logging with ADF parameters, 4) Use Power BI dashboards for pipeline metrics, 5) Set up auto-healing with retry policies.

Advertisement