πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Topic: Multi-Region & Data Replication

Snowflake AdvancedMulti-Region⭐ Premium

Advertisement

Snowflake Advanced Β· Interview Prep

Multi-Region & Data Replication

Difficulty: Hard Β· Commonly asked at Netflix, Apple, Google

Interview Question

"Design a global data platform that serves users in US, Europe, and Asia with sub-100ms query latency. How do you handle data replication, failover, and data sovereignty requirements?"

ℹ️

Companies Asking This: Netflix (Staff Data Engineer), Apple (Principal Data Engineer), Google (Senior Cloud Architect), Amazon (L6 Data Engineer)


Multi-Region Architecture

Organization Level (Global Account)US-EAST-1(Primary)Production DB(Primary)Read ReplicasEU-WEST-1(Replica)Production DB(Replica)Read ReplicasAP-SOUTH-1(Replica)Production DB(Replica)Read ReplicasGlobal Replication (Cross-Region)

Database Replication

-- 1. Create primary database
CREATE DATABASE global_analytics
    DATA_RETENTION_TIME_IN_DAYS = 30;

-- 2. Enable replication to other regions
ALTER DATABASE global_analytics
    SET ENABLE_DATABASE_REPLICATION = TRUE;

-- 3. Create replica in another region
-- (From the secondary account in different region)
CREATE DATABASE global_analytics_failover
    AS REPLICA OF my_organization.my_account.global_analytics;

-- 4. Check replication status
SELECT 
    database_name,
    created_on,
    replication_status,
    primary_region,
    failover_region,
    last_refreshed_on
FROM snowflake.account_usage.replication_database_status
WHERE database_name = 'GLOBAL_ANALYTICS';

-- 5. Monitor replication lag
SELECT 
    database_name,
   DATEDIFF('minute', last_refreshed_on, CURRENT_TIMESTAMP()) AS replication_lag_minutes,
    CASE 
        WHEN DATEDIFF('minute', last_refreshed_on, CURRENT_TIMESTAMP()) > 60 THEN 'CRITICAL'
        WHEN DATEDIFF('minute', last_refreshed_on, CURRENT_TIMESTAMP()) > 15 THEN 'WARNING'
        ELSE 'HEALTHY'
    END AS lag_status
FROM snowflake.account_usage.replication_database_status;

-- 6. Refresh replica (manual if needed)
ALTER DATABASE global_analytics_failover REFRESH;

Failover & Failback

-- 1. Initiate failover to secondary region
-- (Run from secondary account)
ALTER DATABASE global_analytics_failover FAILOVER;

-- 2. Check failover status
SELECT 
    database_name,
    replication_status,
    failover_on,
    failover_region
FROM snowflake.account_usage.replication_database_status
WHERE database_name = 'GLOBAL_ANALYTICS_FAILOVER';

-- 3. After primary region recovers, failback
-- (Run from original primary account after it's restored)
ALTER DATABASE global_analytics_failover REFRESH;

-- 4. Initiate failback
-- (Run from original primary account)
ALTER DATABASE global_analytics FAILOVER;

Real-World Scenario: Netflix

Question: "How do you handle data sovereignty requirements where EU user data must stay in EU regions?"

Solution: Data Residency Pattern

-- 1. Create region-specific databases
CREATE DATABASE analytics_us
    COMMENT = 'US region analytics';

CREATE DATABASE analytics_eu
    COMMENT = 'EU region analytics - GDPR compliant';

CREATE DATABASE analytics_ap
    COMMENT = 'APAC region analytics';

-- 2. Create regional schemas with data residency
CREATE SCHEMA analytics_eu.gdpr_compliant
    COMMENT = 'GDPR compliant data - EU only';

-- 3. Use row-level security for data residency
CREATE TABLE analytics_eu.users (
    user_id VARCHAR(100),
    user_name VARCHAR(200),
    email VARCHAR(200),
    region VARCHAR(10),
    data_residency VARCHAR(10) DEFAULT 'EU'
);

-- 4. Create row access policy for data residency
CREATE OR REPLACE ROW ACCESS POLICY data_residency_policy 
AS (region VARCHAR) RETURNS BOOLEAN ->
    CASE
        WHEN CURRENT_ROLE() IN ('GLOBAL_ADMIN', 'DATA_STEWARD') THEN TRUE
        WHEN CURRENT_REGION() = 'EU-WEST-1' AND region = 'EU' THEN TRUE
        WHEN CURRENT_REGION() = 'US-EAST-1' AND region = 'US' THEN TRUE
        WHEN CURRENT_REGION() = 'AP-SOUTH-1' AND region = 'AP' THEN TRUE
        ELSE FALSE
    END;

-- 5. Apply policy to tables
ALTER TABLE analytics_eu.users ADD ROW ACCESS POLICY data_residency_policy ON (region);

-- 6. Create cross-region view for global analytics
CREATE OR REPLACE VIEW global_analytics.unified_users AS
SELECT 'US' AS data_region, * FROM analytics_us.users WHERE region = 'US'
UNION ALL
SELECT 'EU' AS data_region, * FROM analytics_eu.users WHERE region = 'EU'
UNION ALL
SELECT 'AP' AS data_region, * FROM analytics_ap.users WHERE region = 'AP';

Real-World Scenario: Apple

Question: "How do you optimize query latency for users in different regions while maintaining a single source of truth?"

Solution: Read Replicas + Local Caching

-- 1. Create read replicas in each region
-- Primary in US
CREATE WAREHOUSE us_analytics_wh
    WAREHOUSE_SIZE = 'LARGE'
    MIN_CLUSTER_COUNT = 2
    MAX_CLUSTER_COUNT = 6;

-- Read replica in EU (configured via account settings)
CREATE WAREHOUSE eu_analytics_wh
    WAREHOUSE_SIZE = 'LARGE'
    MIN_CLUSTER_COUNT = 2
    MAX_CLUSTER_COUNT = 6;

-- Read replica in APAC
CREATE WAREHOUSE ap_analytics_wh
    WAREHOUSE_SIZE = 'LARGE'
    MIN_CLUSTER_COUNT = 2
    MAX_CLUSTER_COUNT = 6;

-- 2. Create regional views
CREATE OR REPLACE VIEW analytics_us.regional_dashboard AS
SELECT * FROM global_analytics.dashboard_metrics
WHERE region IN ('US', 'GLOBAL');

CREATE OR REPLACE VIEW analytics_eu.regional_dashboard AS
SELECT * FROM global_analytics.dashboard_metrics
WHERE region IN ('EU', 'GLOBAL');

-- 3. Use query tags for routing
ALTER SESSION SET QUERY_TAG = '{"region": "eu", "workload": "dashboard"}';

-- 4. Monitor latency by region
SELECT 
    CASE 
        WHEN warehouse_name LIKE 'US_%' THEN 'US'
        WHEN warehouse_name LIKE 'EU_%' THEN 'EU'
        WHEN warehouse_name LIKE 'AP_%' THEN 'APAC'
        ELSE 'UNKNOWN'
    END AS region,
    AVG(total_elapsed_time_ms) / 1000 AS avg_latency_seconds,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY total_elapsed_time_ms) / 1000 AS p95_latency
FROM snowflake.account_usage.query_history
WHERE start_time >= DATEADD(day, -7, CURRENT_TIMESTAMP())
GROUP BY 1;

Best Practices

StrategyUse Case
Database replicationDisaster recovery, read scaling
Read replicasReduce latency for remote users
Data residencyGDPR, data sovereignty compliance
Failover planningBusiness continuity
Monitoring replication lagEnsure data freshness

⚠️

Key Considerations:

  1. Replication lag β€” Monitor and alert on lag > threshold
  2. Data sovereignty β€” Ensure data stays in required regions
  3. Failover testing β€” Regularly test failover procedures
  4. Cost implications β€” Replication increases storage costs
  5. Network latency β€” Cross-region queries have higher latency

Advertisement