πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Topic: Query Performance & Result Cache

Snowflake AdvancedPerformance⭐ Premium

Advertisement

Snowflake Advanced Β· Interview Prep

Query Performance & Result Cache

Difficulty: Hard Β· Commonly asked at Netflix, Amazon, Google

Interview Question

"A query that processes 1TB of data takes 30 seconds. After a data load, it now takes 5 minutes. Walk me through your diagnostic process and optimization strategies."

ℹ️

Companies Asking This: Netflix (Staff Data Engineer), Amazon (L6 Data Engineer), Google (Senior Data Engineer), Meta (Data Platform Engineer)


Performance Diagnostic Framework

Step 1: Query Analysis with PROFILE

-- Run query with PROFILE to get detailed metrics
SELECT 
    c.customer_name,
    COUNT(o.order_id) AS order_count,
    SUM(o.amount) AS total_spent
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= '2024-01-01'
GROUP BY 1
ORDER BY 3 DESC;

-- Get query profile
SELECT 
    query_id,
    query_text,
    user_name,
    warehouse_name,
    compilation_time_ms,
    execution_time_ms,
    total_elapsed_time_ms,
    bytes_scanned / (1024*1024*1024) AS gb_scanned,
    bytes_written / (1024*1024) AS mb_written,
    bytes_spilled_to_local_storage / (1024*1024) AS mb_spill_local,
    bytes_spilled_to_remote_storage / (1024*1024) AS mb_spill_remote,
    partitions_scanned,
    partitions_total,
    ROUND(partitions_scanned * 100.0 / NULLIF(partitions_total, 0), 2) AS scan_percentage,
    result_cache_hit
FROM snowflake.account_usage.query_history
WHERE query_text LIKE '%customers c JOIN orders o%'
  AND start_time >= DATEADD(hour, -24, CURRENT_TIMESTAMP())
ORDER BY total_elapsed_time_ms DESC
LIMIT 5;

Result Cache

-- Check if result cache was used
SELECT 
    query_id,
    query_text,
    result_cache_hit,  -- 1 = cached, 0 = executed
    compilation_time_ms,
    execution_time_ms,
    total_elapsed_time_ms
FROM snowflake.account_usage.query_history
WHERE query_text LIKE '%SELECT%FROM orders%'
  AND start_time >= DATEADD(hour, -1, CURRENT_TIMESTAMP())
ORDER BY start_time DESC;

-- Force cache bypass for testing
SELECT * FROM orders 
    /* NO_RESULT_CACHE */
WHERE order_date = CURRENT_DATE();

-- Cache invalidation scenarios:
-- 1. Underlying data changes
-- 2. Volatile functions (CURRENT_DATE(), RANDOM())
-- 3. User role changes
-- 4. Schema changes

-- Test cache behavior
-- First execution (not cached)
SELECT COUNT(*) FROM orders WHERE order_date = CURRENT_DATE();
-- Second execution (cached)
SELECT COUNT(*) FROM orders WHERE order_date = CURRENT_DATE();
-- Check result
SELECT 
    query_id,
    result_cache_hit,
    total_elapsed_time_ms
FROM snowflake.account_usage.query_history
WHERE query_text LIKE '%COUNT(*)%orders%CURRENT_DATE%'
ORDER BY start_time DESC;

Scan Depth Analysis

-- Analyze micro-partition pruning efficiency
SELECT 
    query_id,
    partitions_scanned,
    partitions_total,
    ROUND(partitions_scanned * 100.0 / NULLIF(partitions_total, 0), 2) AS scan_pct,
    CASE 
        WHEN partitions_scanned = partitions_total THEN 'FULL SCAN - CRITICAL'
        WHEN partitions_scanned > partitions_total * 0.5 THEN 'POOR PRUNING - WARNING'
        WHEN partitions_scanned > partitions_total * 0.1 THEN 'MODERATE PRUNING'
        ELSE 'EXCELLENT PRUNING'
    END AS pruning_quality
FROM snowflake.account_usage.query_history
WHERE query_type = 'SELECT'
  AND start_time >= DATEADD(day, -7, CURRENT_TIMESTAMP())
  AND partitions_total > 100
ORDER BY scan_pct DESC
LIMIT 20;

-- Compare pruning before and after data load
-- Before load
SELECT 
    'BEFORE_LOAD' AS period,
    AVG(partitions_scanned * 100.0 / partitions_total) AS avg_scan_pct,
    AVG(total_elapsed_time_ms) AS avg_query_time_ms
FROM snowflake.account_usage.query_history
WHERE query_text LIKE '%orders%'
  AND start_time BETWEEN '2024-01-10' AND '2024-01-15'

UNION ALL

-- After load
SELECT 
    'AFTER_LOAD' AS period,
    AVG(partitions_scanned * 100.0 / partitions_total) AS avg_scan_pct,
    AVG(total_elapsed_time_ms) AS avg_query_time_ms
FROM snowflake.account_usage.query_history
WHERE query_text LIKE '%orders%'
  AND start_time BETWEEN '2024-01-16' AND '2024-01-20';

Real-World Scenario: Netflix

Question: "How do you optimize a dashboard that queries 50TB of data and needs sub-second response times?"

Solution: Multi-Layer Optimization

-- 1. Create materialized view for common aggregations
CREATE MATERIALIZED VIEW mv_dashboard_summary AS
SELECT 
    DATE_TRUNC('hour', event_timestamp) AS event_hour,
    event_type,
    region,
    device_type,
    COUNT(*) AS event_count,
    COUNT(DISTINCT user_id) AS unique_users,
    SUM(CASE WHEN event_type = 'purchase' THEN revenue ELSE 0 END) AS total_revenue
FROM events
WHERE event_timestamp >= DATEADD(day, -90, CURRENT_DATE())
GROUP BY 1, 2, 3, 4;

-- 2. Create clustering key aligned with dashboard queries
ALTER TABLE events CLUSTER BY (event_timestamp, event_type, region);

-- 3. Use result cache aggressively
-- Dashboard queries should use same query structure for cache hits
-- Instead of:
SELECT * FROM events WHERE event_date = CURRENT_DATE() AND event_type = 'click';
-- Use:
SELECT * FROM events WHERE event_date = '2024-01-15' AND event_type = 'click';

-- 4. Create search optimization for common filters
ALTER TABLE events SET SEARCH_OPTIMIZATION = ON;
ALTER TABLE events ADD SEARCH OPTIMIZATION ON (event_type, region, device_type);

-- 5. Monitor dashboard performance
SELECT 
    query_id,
    query_text,
    total_elapsed_time_ms,
    bytes_scanned / (1024*1024*1024) AS gb_scanned,
    result_cache_hit,
    CASE 
        WHEN result_cache_hit = 1 THEN 'CACHED'
        WHEN total_elapsed_time_ms < 1000 THEN 'FAST'
        WHEN total_elapsed_time_ms < 5000 THEN 'ACCEPTABLE'
        ELSE 'SLOW - NEEDS OPTIMIZATION'
    END AS performance_status
FROM snowflake.account_usage.query_history
WHERE query_text LIKE '%dashboard%'
  AND start_time >= DATEADD(hour, -1, CURRENT_TIMESTAMP())
ORDER BY total_elapsed_time_ms DESC;

Real-World Scenario: Amazon

Question: "A query joins 5 tables and spills to remote storage. How do you optimize without changing the query?"

Optimization Without Query Changes

-- 1. Analyze spill metrics
SELECT 
    query_id,
    bytes_spilled_to_local_storage / (1024*1024*1024) AS local_spill_gb,
    bytes_spilled_to_remote_storage / (1024*1024*1024) AS remote_spill_gb,
    execution_time_ms / 1000 AS execution_seconds
FROM snowflake.account_usage.query_history
WHERE query_id = 'your-query-id';

-- 2. Increase warehouse size
ALTER WAREHOUSE analytics_wh SET WAREHOUSE_SIZE = 'X-LARGE';

-- 3. Add clustering keys to joined tables
ALTER TABLE orders CLUSTER BY (customer_id, order_date);
ALTER TABLE customers CLUSTER BY (customer_id);
ALTER TABLE products CLUSTER BY (product_id);

-- 4. Create materialized view for the join
CREATE MATERIALIZED VIEW orders_customers_products AS
SELECT 
    o.order_id,
    o.order_date,
    o.amount,
    c.customer_name,
    c.segment,
    p.product_name,
    p.category
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN products p ON o.product_id = p.product_id;

-- 5. Monitor improvement
SELECT 
    'BEFORE' AS optimization,
    AVG(bytes_spilled_to_remote_storage / (1024*1024*1024)) AS avg_remote_spill_gb,
    AVG(execution_time_ms / 1000) AS avg_execution_seconds
FROM snowflake.account_usage.query_history
WHERE query_id = 'before-optimization-query-id'

UNION ALL

SELECT 
    'AFTER' AS optimization,
    AVG(bytes_spilled_to_remote_storage / (1024*1024*1024)) AS avg_remote_spill_gb,
    AVG(execution_time_ms / 1000) AS avg_execution_seconds
FROM snowflake.account_usage.query_history
WHERE query_id = 'after-optimization-query-id';

Performance Monitoring Dashboard

-- 1. Query performance trends
SELECT 
    DATE(start_time) AS query_date,
    AVG(total_elapsed_time_ms) / 1000 AS avg_seconds,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY total_elapsed_time_ms) / 1000 AS median_seconds,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY total_elapsed_time_ms) / 1000 AS p95_seconds,
    COUNT(*) AS query_count,
    SUM(CASE WHEN result_cache_hit = 1 THEN 1 ELSE 0 END) AS cached_queries
FROM snowflake.account_usage.query_history
WHERE query_type = 'SELECT'
  AND start_time >= DATEADD(day, -30, CURRENT_TIMESTAMP())
GROUP BY 1
ORDER BY 1;

-- 2. Warehouse performance comparison
SELECT 
    warehouse_name,
    warehouse_size,
    AVG(total_elapsed_time_ms) / 1000 AS avg_query_seconds,
    SUM(bytes_scanned / (1024*1024*1024)) AS total_tb_scanned,
    SUM(credits_used) AS total_credits,
    SUM(bytes_scanned / (1024*1024*1024)) / NULLIF(SUM(credits_used), 0) AS tb_per_credit
FROM snowflake.account_usage.query_history q
JOIN snowflake.account_usage.warehouse_metering_history w
    ON q.warehouse_name = w.warehouse_name
WHERE q.start_time >= DATEADD(day, -7, CURRENT_TIMESTAMP())
GROUP BY 1, 2
ORDER BY 5 DESC;

-- 3. Identify performance bottlenecks
SELECT 
    CASE 
        WHEN compilation_time_ms > 1000 THEN 'COMPILATION'
        WHEN bytes_spilled_to_remote_storage > 0 THEN 'REMOTE_SPILL'
        WHEN partitions_scanned = partitions_total THEN 'FULL_SCAN'
        WHEN total_elapsed_time_ms > 30000 THEN 'LONG_RUNNING'
        ELSE 'NORMAL'
    END AS bottleneck_type,
    COUNT(*) AS query_count,
    AVG(total_elapsed_time_ms) / 1000 AS avg_seconds
FROM snowflake.account_usage.query_history
WHERE query_type = 'SELECT'
  AND start_time >= DATEADD(day, -7, CURRENT_TIMESTAMP())
GROUP BY 1
ORDER BY 2 DESC;

Best Practices

IssueSolution
Full table scansAdd clustering keys, use search optimization
Remote spillIncrease warehouse size, reduce data volume
Slow compilationSimplify queries, use materialized views
Low cache hit rateUse consistent query patterns, avoid volatile functions
High queue depthUse multi-cluster warehouse
Slow joinsCluster tables by join keys

⚠️

Performance Anti-Patterns:

  1. **SELECT *** β€” Only select needed columns
  2. Missing WHERE clauses β€” Filter early to reduce data scanned
  3. Using functions in WHERE β€” Prevents predicate pushdown
  4. Not monitoring β€” Set up alerts for query time degradation
  5. Ignoring spill β€” Remote spill kills performance

Advertisement