CI/CD: DevOps, ARM/Bicep & ADF Git Integration
Automated deployment pipelines for Azure data engineering with DevOps, IaC, and Git integration
CI/CD Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CI/CD PIPELINE ARCHITECTURE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β DEVELOPMENT BUILD DEPLOYMENT β
β ββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Git βββββββββ>β Azure DevOps ββββββ>β Dev β β
β β Repositoryβ β Build Pipelineβ β Environment β β
β β (ADF Git)β β β β β β
β ββββββββββββ β β’ Validate β ββββββββ¬ββββββββ β
β β β’ Package β β β
β β β’ Test β βΌ β
β ββββββββββββββββ ββββββββββββββββ β
β β QA β β
β ββββββββββββββββ β Environment β β
β β ARM/Bicep ββββββββββββββββββββββββββ>β β β
β β Templates β ββββββββ¬ββββββββ β
β ββββββββββββββββ β β
β βΌ β
β ββββββββββββββββ ββββββββββββββββ β
β β Terraform ββββββββββββββββββββββββββ>β Production β β
β β (Optional) β β Environment β β
β ββββββββββββββββ ββββββββββββββββ β
β β
β ADF GIT INTEGRATION: β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Dev Branch ββ> PR ββ> Main Branch ββ> Publish ββ> Prod β β
β β β β
β β β’ Collaborative editing in ADF Studio β β
β β β’ Branch-based development β β
β β β’ PR validation and code review β β
β β β’ Automated publishing to collaboration mode β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Bicep Template Example
// Data engineering infrastructure
param location string = resourceGroup().location
param environment string = 'prod'
// Storage Account
resource storageAccount 'Microsoft.Storage/storageAccounts@2023-01-01' = {
name: 'stdatalake${environment}${location}'
location: location
kind: 'StorageV2'
sku: { name: 'Standard_LRS' }
properties: {
isHnsEnabled: true
supportsHttpsTrafficOnly: true
minimumTlsVersion: 'TLS1_2'
encryption: {
services: { blob: { enabled: true } }
keySource: 'Microsoft.Storage'
}
}
}
// Synapse Workspace
resource synapseWorkspace 'Microsoft.Synapse/workspaces@2023-05-01' = {
name: 'syn-${environment}-workspace'
location: location
identity: { type: 'SystemAssigned' }
properties: {
defaultDataLakeStorage: {
accountUrl: 'https://${storageAccount.name}.dfs.core.windows.net'
filesystem: 'synapsefs'
}
sqlAdministratorLogin: 'sqladmin'
}
}
// ADF with Git integration
resource dataFactory 'Microsoft.DataFactory/factories@2018-06-01' = {
name: 'adf-${environment}'
location: location
identity: { type: 'SystemAssigned' }
properties: {
repoConfiguration: {
accountName: 'your-ado-org'
projectName: 'data-engineering'
repositoryName: 'adf-repo'
collaborationBranch: 'main'
rootFolder: '/'
}
}
}
Azure DevOps Pipeline
# azure-pipelines.yml
trigger:
branches:
include:
- main
- feature/*
stages:
- stage: Build
jobs:
- job: ValidateTemplates
pool:
vmImage: 'ubuntu-latest'
steps:
- task: AzureCLI@2
inputs:
azureSubscription: 'dataengineering-subscription'
scriptType: 'bash'
scriptLocation: 'inlineScript'
inlineScript: |
az deployment group validate \
--resource-group rg-dataengineering-dev \
--template-file infra/main.bicep \
--parameters environment=dev
- stage: DeployDev
dependsOn: Build
condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/main'))
jobs:
- deployment: DeployToDev
environment: 'dev'
strategy:
runOnce:
deploy:
steps:
- task: AzureCLI@2
inputs:
azureSubscription: 'dataengineering-subscription'
scriptType: 'bash'
scriptLocation: 'inlineScript'
inlineScript: |
az deployment group create \
--resource-group rg-dataengineering-dev \
--template-file infra/main.bicep \
--parameters environment=dev
- stage: DeployProd
dependsOn: DeployDev
condition: succeeded()
jobs:
- deployment: DeployToProd
environment: 'prod'
strategy:
runOnce:
deploy:
steps:
- task: AzureCLI@2
inputs:
azureSubscription: 'dataengineering-subscription'
scriptType: 'bash'
scriptLocation: 'inlineScript'
inlineScript: |
az deployment group create \
--resource-group rg-dataengineering-prod \
--template-file infra/main.bicep \
--parameters environment=prod
βΉοΈ
Pro Tip: Use ADF's Git integration for collaborative development. Publish changes through DevOps pipelines to ensure consistent deployments across environments.
Interview Questions
Q1: How do you handle parameterization across environments in ADF? A: Use ADF parameters, linked services with Key Vault references, and ARM template parameters. Store environment-specific values in Key Vault and reference them via expressions.
Q2: What is the difference between ADF collaboration mode and live mode? A: Collaboration mode (Git) enables collaborative development with branching and version control. Live mode is the published, running version. Changes in collaboration mode must be published to go live.
Q3: How do you implement blue-green deployments for data pipelines? A: Deploy new pipeline version alongside existing, validate with test data, switch traffic using deployment slots or feature flags, monitor for issues, and rollback if needed.