DR Architecture
GCS Cross-Region Replication
from google.cloud import storage
client = storage.Client()
# Create dual-region bucket (automatic replication)
bucket = client.bucket("my-critical-data")
bucket.storage_class = "STANDARD"
bucket.location = "US-EAST1" # Pairs with US-EAST4
bucket.location_type = "dual-region"
bucket = client.create_bucket(bucket, exists_ok=True)
print(f"Created dual-region bucket: {bucket.name}")
BigQuery Cross-Region Backup
from google.cloud import bigquery
client = bigquery.Client()
# Copy dataset to another region for DR
job = client.copy_table(
"my-project:analytics.sales",
"my-project:dr_backup.sales_backup",
job_config=bigquery.CopyJobConfig(
write_disposition=bigquery.WriteDisposition.WRITE_TRUNCATE,
destination_encryption_configuration=None
)
)
job.result()
print(f"Backup completed: {job.output_rows} rows copied")
β¨
Best Practice: Use dual-region GCS buckets for critical data (11 9s availability). Schedule BigQuery dataset copies for DR. Implement automated backup verification. Test DR procedures quarterly. Document RPO/RTO requirements for each data tier.
Common Interview Questions
Q1: What is the difference between RPO and RTO?
Answer: RPO (Recovery Point Objective) is the maximum acceptable data loss measured in time. RTO (Recovery Time Objective) is the maximum acceptable downtime. RPO determines backup frequency; RTO determines infrastructure requirements.
Q2: When would you use dual-region vs. multi-region GCS?
Answer: Dual-region provides 11 9s availability with lower latency between two specific regions. Multi-region provides 4 9s availability across all regions in a geographic area. Use dual-region for critical data requiring high availability and specific data residency.
Q3: How do you test disaster recovery procedures?
Answer: 1) Regular DR drills (quarterly), 2) Restore from backups to test environment, 3) Validate data integrity, 4) Measure actual RPO/RTO, 5) Document and update procedures based on findings.
Q4: What are the DR options for BigQuery?
Answer: 1) Cross-region dataset copies, 2) Time-travel (7 days), 3) Table snapshots, 4) Export to GCS, 5) BigQuery Omni for multi-cloud. Combine these strategies based on RPO/RTO requirements.
Q5: How do you handle DR for streaming pipelines?
Answer: 1) Use Pub/Sub multi-region topics, 2) Deploy Dataflow jobs in multiple regions, 3) Use Cloud Composer environment in secondary region, 4) Implement failover automation, 5) Test failover regularly.