Snowflake Advanced Β· Interview Prep
Multi-Region & Data Replication
Difficulty: Hard Β· Commonly asked at Netflix, Apple, Google
Interview Question
"Design a global data platform that serves users in US, Europe, and Asia with sub-100ms query latency. How do you handle data replication, failover, and data sovereignty requirements?"
βΉοΈ
Companies Asking This: Netflix (Staff Data Engineer), Apple (Principal Data Engineer), Google (Senior Cloud Architect), Amazon (L6 Data Engineer)
Multi-Region Architecture
Database Replication
-- 1. Create primary database
CREATE DATABASE global_analytics
DATA_RETENTION_TIME_IN_DAYS = 30;
-- 2. Enable replication to other regions
ALTER DATABASE global_analytics
SET ENABLE_DATABASE_REPLICATION = TRUE;
-- 3. Create replica in another region
-- (From the secondary account in different region)
CREATE DATABASE global_analytics_failover
AS REPLICA OF my_organization.my_account.global_analytics;
-- 4. Check replication status
SELECT
database_name,
created_on,
replication_status,
primary_region,
failover_region,
last_refreshed_on
FROM snowflake.account_usage.replication_database_status
WHERE database_name = 'GLOBAL_ANALYTICS';
-- 5. Monitor replication lag
SELECT
database_name,
DATEDIFF('minute', last_refreshed_on, CURRENT_TIMESTAMP()) AS replication_lag_minutes,
CASE
WHEN DATEDIFF('minute', last_refreshed_on, CURRENT_TIMESTAMP()) > 60 THEN 'CRITICAL'
WHEN DATEDIFF('minute', last_refreshed_on, CURRENT_TIMESTAMP()) > 15 THEN 'WARNING'
ELSE 'HEALTHY'
END AS lag_status
FROM snowflake.account_usage.replication_database_status;
-- 6. Refresh replica (manual if needed)
ALTER DATABASE global_analytics_failover REFRESH;
Failover & Failback
-- 1. Initiate failover to secondary region
-- (Run from secondary account)
ALTER DATABASE global_analytics_failover FAILOVER;
-- 2. Check failover status
SELECT
database_name,
replication_status,
failover_on,
failover_region
FROM snowflake.account_usage.replication_database_status
WHERE database_name = 'GLOBAL_ANALYTICS_FAILOVER';
-- 3. After primary region recovers, failback
-- (Run from original primary account after it's restored)
ALTER DATABASE global_analytics_failover REFRESH;
-- 4. Initiate failback
-- (Run from original primary account)
ALTER DATABASE global_analytics FAILOVER;
Real-World Scenario: Netflix
Question: "How do you handle data sovereignty requirements where EU user data must stay in EU regions?"
Solution: Data Residency Pattern
-- 1. Create region-specific databases
CREATE DATABASE analytics_us
COMMENT = 'US region analytics';
CREATE DATABASE analytics_eu
COMMENT = 'EU region analytics - GDPR compliant';
CREATE DATABASE analytics_ap
COMMENT = 'APAC region analytics';
-- 2. Create regional schemas with data residency
CREATE SCHEMA analytics_eu.gdpr_compliant
COMMENT = 'GDPR compliant data - EU only';
-- 3. Use row-level security for data residency
CREATE TABLE analytics_eu.users (
user_id VARCHAR(100),
user_name VARCHAR(200),
email VARCHAR(200),
region VARCHAR(10),
data_residency VARCHAR(10) DEFAULT 'EU'
);
-- 4. Create row access policy for data residency
CREATE OR REPLACE ROW ACCESS POLICY data_residency_policy
AS (region VARCHAR) RETURNS BOOLEAN ->
CASE
WHEN CURRENT_ROLE() IN ('GLOBAL_ADMIN', 'DATA_STEWARD') THEN TRUE
WHEN CURRENT_REGION() = 'EU-WEST-1' AND region = 'EU' THEN TRUE
WHEN CURRENT_REGION() = 'US-EAST-1' AND region = 'US' THEN TRUE
WHEN CURRENT_REGION() = 'AP-SOUTH-1' AND region = 'AP' THEN TRUE
ELSE FALSE
END;
-- 5. Apply policy to tables
ALTER TABLE analytics_eu.users ADD ROW ACCESS POLICY data_residency_policy ON (region);
-- 6. Create cross-region view for global analytics
CREATE OR REPLACE VIEW global_analytics.unified_users AS
SELECT 'US' AS data_region, * FROM analytics_us.users WHERE region = 'US'
UNION ALL
SELECT 'EU' AS data_region, * FROM analytics_eu.users WHERE region = 'EU'
UNION ALL
SELECT 'AP' AS data_region, * FROM analytics_ap.users WHERE region = 'AP';
Real-World Scenario: Apple
Question: "How do you optimize query latency for users in different regions while maintaining a single source of truth?"
Solution: Read Replicas + Local Caching
-- 1. Create read replicas in each region
-- Primary in US
CREATE WAREHOUSE us_analytics_wh
WAREHOUSE_SIZE = 'LARGE'
MIN_CLUSTER_COUNT = 2
MAX_CLUSTER_COUNT = 6;
-- Read replica in EU (configured via account settings)
CREATE WAREHOUSE eu_analytics_wh
WAREHOUSE_SIZE = 'LARGE'
MIN_CLUSTER_COUNT = 2
MAX_CLUSTER_COUNT = 6;
-- Read replica in APAC
CREATE WAREHOUSE ap_analytics_wh
WAREHOUSE_SIZE = 'LARGE'
MIN_CLUSTER_COUNT = 2
MAX_CLUSTER_COUNT = 6;
-- 2. Create regional views
CREATE OR REPLACE VIEW analytics_us.regional_dashboard AS
SELECT * FROM global_analytics.dashboard_metrics
WHERE region IN ('US', 'GLOBAL');
CREATE OR REPLACE VIEW analytics_eu.regional_dashboard AS
SELECT * FROM global_analytics.dashboard_metrics
WHERE region IN ('EU', 'GLOBAL');
-- 3. Use query tags for routing
ALTER SESSION SET QUERY_TAG = '{"region": "eu", "workload": "dashboard"}';
-- 4. Monitor latency by region
SELECT
CASE
WHEN warehouse_name LIKE 'US_%' THEN 'US'
WHEN warehouse_name LIKE 'EU_%' THEN 'EU'
WHEN warehouse_name LIKE 'AP_%' THEN 'APAC'
ELSE 'UNKNOWN'
END AS region,
AVG(total_elapsed_time_ms) / 1000 AS avg_latency_seconds,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY total_elapsed_time_ms) / 1000 AS p95_latency
FROM snowflake.account_usage.query_history
WHERE start_time >= DATEADD(day, -7, CURRENT_TIMESTAMP())
GROUP BY 1;
Best Practices
| Strategy | Use Case |
|---|---|
| Database replication | Disaster recovery, read scaling |
| Read replicas | Reduce latency for remote users |
| Data residency | GDPR, data sovereignty compliance |
| Failover planning | Business continuity |
| Monitoring replication lag | Ensure data freshness |
β οΈ
Key Considerations:
- Replication lag β Monitor and alert on lag > threshold
- Data sovereignty β Ensure data stays in required regions
- Failover testing β Regularly test failover procedures
- Cost implications β Replication increases storage costs
- Network latency β Cross-region queries have higher latency