Snowflake Caching and Performance
Snowflake employs a multi-layered caching architecture that dramatically accelerates query performance by storing intermediate and final results at various levels.
Result Cache
The result cache stores the final results of executed queries for reuse:
-- Enable result cache (default: ON)
ALTER SESSION SET USE_CACHED_RESULT = TRUE;
-- Check if query uses cache
SELECT /*+ NO_USE_CACHED_RESULT */
order_date,
COUNT(*) as order_count
FROM orders
GROUP BY order_date;
-- Monitor cache usage
SELECT
query_id,
query_text,
result_cache_hit,
execution_time_ms
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY(
START_TIME => DATEADD('hour', -1, CURRENT_TIMESTAMP())
))
WHERE query_text LIKE '%GROUP BY%';
Cache Invalidation Rules
| Condition | Cache Behavior |
|---|---|
| Same query text | Cache hit |
| Same session | Cache hit |
| Same warehouse | Cache hit |
| Data modification | Cache invalidated |
| Time travel query | Different cache |
| Different warehouse | Cache miss |
Micro-Partition Cache
Micro-partitions are automatically cached after first access:
-- Check micro-partition cache statistics
SELECT
table_name,
partition_count,
partitions_pruned,
partitions_scanned
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY(
START_TIME => DATEADD('hour', -1, CURRENT_TIMESTAMP())
))
WHERE query_text LIKE '%orders%'
LIMIT 5;
-- Monitor cache hit ratio
SELECT
table_name,
total_micro_partitions,
cached_micro_partitions,
cache_hit_ratio
FROM TABLE(INFORMATION_SCHEMA.MICROPARTITION_CACHE_METRICS)
WHERE table_name = 'ORDERS';
Micro-Partition Pruning
-- Enable partition pruning (default: ON)
ALTER SESSION SET ENABLE_PRUNING_OPTIMIZATION = TRUE;
-- Check pruning effectiveness
EXPLAIN SELECT * FROM orders WHERE order_date = '2024-01-15';
-- Monitor pruning statistics
SELECT
query_id,
partitions_total,
partitions_scanned,
pruning_ratio
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY(
START_TIME => DATEADD('hour', -1, CURRENT_TIMESTAMP())
))
WHERE query_text LIKE '%WHERE order_date%';
Remote Disk Cache
Snowflake caches data on remote disk for persistent performance:
-- Check remote cache status
SELECT
table_name,
remote_disk_cache_bytes / 1024 / 1024 AS cache_size_mb,
cache_hit_ratio
FROM INFORMATION_SCHEMA.REMOTE_DISK_CACHE_METRICS
WHERE table_name = 'ORDERS';
-- Force cache refresh
ALTER TABLE orders RECLUSTER;
Performance Optimization Strategies
Warehouse Sizing
-- Check warehouse performance
SELECT
warehouse_name,
avg_query_time_ms,
total_queries,
cache_hit_ratio
FROM TABLE(INFORMATION_SCHEMA.WAREHOUSE_METERING_HISTORY(
START_TIME => DATEADD('day', -7, CURRENT_TIMESTAMP())
))
WHERE warehouse_name = 'COMPUTE_WH';
-- Scale warehouse for better caching
ALTER WAREHOUSE compute_wh SET WAREHOUSE_SIZE = 'LARGE';
Query Optimization
-- Use materialized views for complex aggregations
CREATE MATERIALIZED VIEW mv_daily_summary AS
SELECT
order_date,
COUNT(*) as order_count,
SUM(amount) as total_revenue
FROM orders
GROUP BY order_date;
-- Enable result caching for specific queries
SELECT /*+ USE_CACHED_RESULT */
order_date,
COUNT(*) as order_count,
SUM(amount) as total_revenue
FROM orders
GROUP BY order_date;
Partition Optimization
-- Cluster tables for better pruning
ALTER TABLE orders CLUSTER BY (order_date, region);
-- Check clustering depth
SELECT
table_name,
clustering_depth,
clustering_information
FROM INFORMATION_SCHEMA.TABLE_STORAGE_METRICS
WHERE table_name = 'ORDERS';
-- Manual re-clustering
ALTER TABLE orders RECLUSTER;
Cache Monitoring
-- Comprehensive cache metrics
SELECT
'Result Cache' as cache_type,
SUM(result_cache_hit) as hits,
COUNT(*) - SUM(result_cache_hit) as misses,
ROUND(SUM(result_cache_hit) / COUNT(*) * 100, 2) as hit_ratio
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY(
START_TIME => DATEADD('day', -1, CURRENT_TIMESTAMP())
))
UNION ALL
SELECT
'Micro-Partition Cache' as cache_type,
SUM(partitions_cached) as hits,
SUM(partitions_scanned) as misses,
ROUND(SUM(partitions_cached) / SUM(partitions_scanned) * 100, 2) as hit_ratio
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY(
START_TIME => DATEADD('day', -1, CURRENT_TIMESTAMP())
));
For optimal caching performance, keep query text consistent (avoid dynamic SQL), ensure sufficient warehouse size for parallel processing, and use clustering to improve partition pruning.
Performance Best Practices
| Strategy | Implementation | Expected Improvement |
|---|---|---|
| Result Caching | USE_CACHED_RESULT hint | 10-100x for repeated queries |
| Partition Pruning | Filter on clustered columns | 5-50x for filtered queries |
| Materialized Views | Pre-aggregate common queries | 2-20x for aggregations |
| Warehouse Sizing | Match workload to size | 10-50% improvement |
| Query Optimization | Simplify SQL, reduce JOINs | 2-10x for complex queries |
-- Monitor overall performance
SELECT
query_id,
query_text,
execution_time_ms,
bytes_scanned,
result_cache_hit,
ROUND(bytes_scanned / 1024 / 1024, 2) AS scanned_mb
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY(
START_TIME => DATEADD('hour', -1, CURRENT_TIMESTAMP())
))
ORDER BY execution_time_ms DESC
LIMIT 10;
Key Takeaways:
- Result cache provides instant results for identical queries
- Micro-partition cache improves performance through intelligent pruning
- Remote disk cache provides persistent performance benefits
- Cache hit ratio is a key performance metric
- Clustering and partitioning optimize cache effectiveness
- Warehouse sizing directly impacts caching performance