CW

Snowflake Caching and Performance

Free Lesson

Advertisement

Snowflake Caching and Performance

Snowflake employs a multi-layered caching architecture that dramatically accelerates query performance by storing intermediate and final results at various levels.

Result Cache

The result cache stores the final results of executed queries for reuse:

-- Enable result cache (default: ON)
ALTER SESSION SET USE_CACHED_RESULT = TRUE;

-- Check if query uses cache
SELECT /*+ NO_USE_CACHED_RESULT */
  order_date,
  COUNT(*) as order_count
FROM orders
GROUP BY order_date;

-- Monitor cache usage
SELECT
  query_id,
  query_text,
  result_cache_hit,
  execution_time_ms
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY(
  START_TIME => DATEADD('hour', -1, CURRENT_TIMESTAMP())
))
WHERE query_text LIKE '%GROUP BY%';

Cache Invalidation Rules

ConditionCache Behavior
Same query textCache hit
Same sessionCache hit
Same warehouseCache hit
Data modificationCache invalidated
Time travel queryDifferent cache
Different warehouseCache miss

Micro-Partition Cache

Micro-partitions are automatically cached after first access:

-- Check micro-partition cache statistics
SELECT
  table_name,
  partition_count,
  partitions_pruned,
  partitions_scanned
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY(
  START_TIME => DATEADD('hour', -1, CURRENT_TIMESTAMP())
))
WHERE query_text LIKE '%orders%'
LIMIT 5;

-- Monitor cache hit ratio
SELECT
  table_name,
  total_micro_partitions,
  cached_micro_partitions,
  cache_hit_ratio
FROM TABLE(INFORMATION_SCHEMA.MICROPARTITION_CACHE_METRICS)
WHERE table_name = 'ORDERS';

Micro-Partition Pruning

-- Enable partition pruning (default: ON)
ALTER SESSION SET ENABLE_PRUNING_OPTIMIZATION = TRUE;

-- Check pruning effectiveness
EXPLAIN SELECT * FROM orders WHERE order_date = '2024-01-15';

-- Monitor pruning statistics
SELECT
  query_id,
  partitions_total,
  partitions_scanned,
  pruning_ratio
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY(
  START_TIME => DATEADD('hour', -1, CURRENT_TIMESTAMP())
))
WHERE query_text LIKE '%WHERE order_date%';

Remote Disk Cache

Snowflake caches data on remote disk for persistent performance:

-- Check remote cache status
SELECT
  table_name,
  remote_disk_cache_bytes / 1024 / 1024 AS cache_size_mb,
  cache_hit_ratio
FROM INFORMATION_SCHEMA.REMOTE_DISK_CACHE_METRICS
WHERE table_name = 'ORDERS';

-- Force cache refresh
ALTER TABLE orders RECLUSTER;

Performance Optimization Strategies

Warehouse Sizing

-- Check warehouse performance
SELECT
  warehouse_name,
  avg_query_time_ms,
  total_queries,
  cache_hit_ratio
FROM TABLE(INFORMATION_SCHEMA.WAREHOUSE_METERING_HISTORY(
  START_TIME => DATEADD('day', -7, CURRENT_TIMESTAMP())
))
WHERE warehouse_name = 'COMPUTE_WH';

-- Scale warehouse for better caching
ALTER WAREHOUSE compute_wh SET WAREHOUSE_SIZE = 'LARGE';

Query Optimization

-- Use materialized views for complex aggregations
CREATE MATERIALIZED VIEW mv_daily_summary AS
SELECT
  order_date,
  COUNT(*) as order_count,
  SUM(amount) as total_revenue
FROM orders
GROUP BY order_date;

-- Enable result caching for specific queries
SELECT /*+ USE_CACHED_RESULT */
  order_date,
  COUNT(*) as order_count,
  SUM(amount) as total_revenue
FROM orders
GROUP BY order_date;

Partition Optimization

-- Cluster tables for better pruning
ALTER TABLE orders CLUSTER BY (order_date, region);

-- Check clustering depth
SELECT
  table_name,
  clustering_depth,
  clustering_information
FROM INFORMATION_SCHEMA.TABLE_STORAGE_METRICS
WHERE table_name = 'ORDERS';

-- Manual re-clustering
ALTER TABLE orders RECLUSTER;

Cache Monitoring

-- Comprehensive cache metrics
SELECT
  'Result Cache' as cache_type,
  SUM(result_cache_hit) as hits,
  COUNT(*) - SUM(result_cache_hit) as misses,
  ROUND(SUM(result_cache_hit) / COUNT(*) * 100, 2) as hit_ratio
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY(
  START_TIME => DATEADD('day', -1, CURRENT_TIMESTAMP())
))

UNION ALL

SELECT
  'Micro-Partition Cache' as cache_type,
  SUM(partitions_cached) as hits,
  SUM(partitions_scanned) as misses,
  ROUND(SUM(partitions_cached) / SUM(partitions_scanned) * 100, 2) as hit_ratio
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY(
  START_TIME => DATEADD('day', -1, CURRENT_TIMESTAMP())
));

For optimal caching performance, keep query text consistent (avoid dynamic SQL), ensure sufficient warehouse size for parallel processing, and use clustering to improve partition pruning.

Performance Best Practices

StrategyImplementationExpected Improvement
Result CachingUSE_CACHED_RESULT hint10-100x for repeated queries
Partition PruningFilter on clustered columns5-50x for filtered queries
Materialized ViewsPre-aggregate common queries2-20x for aggregations
Warehouse SizingMatch workload to size10-50% improvement
Query OptimizationSimplify SQL, reduce JOINs2-10x for complex queries
-- Monitor overall performance
SELECT
  query_id,
  query_text,
  execution_time_ms,
  bytes_scanned,
  result_cache_hit,
  ROUND(bytes_scanned / 1024 / 1024, 2) AS scanned_mb
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY(
  START_TIME => DATEADD('hour', -1, CURRENT_TIMESTAMP())
))
ORDER BY execution_time_ms DESC
LIMIT 10;

Key Takeaways:

  • Result cache provides instant results for identical queries
  • Micro-partition cache improves performance through intelligent pruning
  • Remote disk cache provides persistent performance benefits
  • Cache hit ratio is a key performance metric
  • Clustering and partitioning optimize cache effectiveness
  • Warehouse sizing directly impacts caching performance

Advertisement

Need Expert Snowflake Help?

Get personalized warehouse optimization, data modeling, or Snowflake platform consulting.

Advertisement