Backup, Geo-Redundancy & Disaster Recovery
Business continuity with backup strategies, geo-redundancy, and disaster recovery for data engineering
DR Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DISASTER RECOVERY ARCHITECTURE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β PRIMARY REGION: EAST US 2 SECONDARY REGION: WEST US 2 β
β ββββββββββββββββββββββββ ββββββββββββββββββββββββ β
β β ββββββββββββββββββββ β β ββββββββββββββββββββ β β
β β β ADLS Gen2 β β Geo-Repβ β ADLS Gen2 β β β
β β β (RA-GRS) βββΌβββββββ>β β (Secondary) β β β
β β ββββββββββββββββββββ β β ββββββββββββββββββββ β β
β β β β β β
β β ββββββββββββββββββββ β β ββββββββββββββββββββ β β
β β β Synapse Pool β β Geo-Repβ β Synapse Pool β β β
β β β (Active) βββΌβββββββ>β β (Standby) β β β
β β ββββββββββββββββββββ β β ββββββββββββββββββββ β β
β β β β β β
β β ββββββββββββββββββββ β β ββββββββββββββββββββ β β
β β β Cosmos DB β β Multi- β β Cosmos DB β β β
β β β (Multi-region βββΌβββββββ>β β (Replica) β β β
β β β writes) β β region β β β β β
β β ββββββββββββββββββββ β β ββββββββββββββββββββ β β
β ββββββββββββββββββββββββ ββββββββββββββββββββββββ β
β β
β RPO/RTO TARGETS: β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Service β RPO β RTO β SLA β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β ADLS (RA-GRS) β <15 min β <30 min β 99.99% β β
β β Synapse (Geo) β <1 hour β <4 hours β 99.9% β β
β β Cosmos DB β 0 (multi- β 0 (automatic β 99.999% β β
β β β region) β failover) β β β
β β Event Hubs β 0 (capture) β Minutes β 99.95% β β
β β Databricks β Varies β Minutes-Hours β 99.9% β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Backup Configuration
# Point-in-time restore for Synapse
# Azure CLI command
# az synapse sql pool restore \
# --name SQLPool01 \
# --workspace-name syn-prod-workspace \
# --resource-group rg-dataengineering-prod \
# --restore-point "$(date -d '2 hours ago' +%Y-%m-%dT%H:%M:%S)"
# Cosmos DB continuous backup
import requests
token = credential.get_token("https://management.azure.com/.default")
# Enable continuous backup
response = requests.patch(
f"https://management.azure.com/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.DocumentDB/databaseAccounts/{account}",
headers={"Authorization": f"Bearer {token.token}", "Content-Type": "application/json"},
json={
"properties": {
"backupPolicy": {
"type": "Continuous",
"continuousModeProperties": {
"tier": "Continuous7Days"
}
}
}
}
)
Geo-Redundancy Configuration
resource storageAccount 'Microsoft.Storage/storageAccounts@2023-01-01' = {
name: 'stdatalake001'
location: location
sku: {
name: 'Standard_RAGRS' // Read-Access Geo-Redundant
}
kind: 'StorageV2'
properties: {
isHnsEnabled: true
replication: {
geoReplication: {
enabled: true
destinationAccountName: 'stdatalake001-geo'
destinationRegion: 'westus2'
}
}
}
}
βΉοΈ
Pro Tip: Use RA-GRS for ADLS Gen2 to provide read access during regional outages. Use Cosmos DB multi-region writes for automatic failover with zero data loss.
Interview Questions
Q1: Explain the difference between RPO and RTO. A: RPO (Recovery Point Objective) is the maximum acceptable data loss (e.g., 15 minutes). RTO (Recovery Time Objective) is the maximum acceptable downtime (e.g., 1 hour). Both drive DR strategy design.
Q2: How do you test disaster recovery in Azure? A: 1) Simulate regional outage, 2) Initiate failover to secondary region, 3) Verify data integrity, 4) Test application functionality, 5) Measure actual RPO/RTO vs targets, 6) Document findings and improve.
Q3: What is the cost impact of geo-redundancy? A: Geo-redundancy doubles storage costs (primary + secondary). However, the cost of downtime (lost revenue, reputation) often far exceeds the additional storage cost. Use RA-GRS for critical data.