πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Snowflake: Clustering, Time Travel, Zero-Copy Cloning

Data EngineeringCloud Data Warehouse⭐ Premium

Advertisement

Snowflake & Databricks Interview

Snowflake: Clustering, Time Travel, Zero-Copy Cloning

Deep dive into Snowflake architecture and features

Interview Question

"You have a 5TB Snowflake table with 1 billion rows. Queries filtering by 'customer_id' and 'order_date' are slow. Design: (1) clustering strategy, (2) time travel configuration, (3) zero-copy cloning for testing, (4) cost optimization. Include specific Snowflake SQL commands."

Difficulty: Hard | Frequently asked at Snowflake, Databricks, Netflix, Uber


Theoretical Foundation

Snowflake Architecture

Architecture Diagram
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                Snowflake Architecture                       β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚              Cloud Services Layer                    β”‚   β”‚
β”‚  β”‚  - Authentication                                    β”‚   β”‚
β”‚  β”‚  - Infrastructure management                         β”‚   β”‚
β”‚  β”‚  - Query optimization                                β”‚   β”‚
β”‚  β”‚  - Metadata management                               β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚              Query Processing Layer                  β”‚   β”‚
β”‚  β”‚  - Virtual warehouses (compute)                      β”‚   β”‚
β”‚  β”‚  - Auto-scaling                                      β”‚   β”‚
β”‚  β”‚  - Query result cache                                β”‚   β”‚
β”‚  β”‚  - Multi-cluster warehouses                          β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚              Storage Layer                           β”‚   β”‚
β”‚  β”‚  - Micro-partitions (columnar)                       β”‚   β”‚
β”‚  β”‚  - Automatic clustering                              β”‚   β”‚
β”‚  β”‚  - Compression                                        β”‚   β”‚
β”‚  β”‚  - Metadata                                          β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Micro-Partitions

Snowflake stores data in micro-partitions (50-500MB compressed).

Architecture Diagram
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Micro-Partitions                         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                             β”‚
β”‚  Table: orders                                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ Micro-Partition 1: rows 1-1,000,000                 β”‚   β”‚
β”‚  β”‚ - customer_id range: 1-1000                         β”‚   β”‚
β”‚  β”‚ - order_date range: 2024-01-01 to 2024-01-31        β”‚   β”‚
β”‚  β”‚ - Min/Max values stored in metadata                 β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ Micro-Partition 2: rows 1,000,001-2,000,000        β”‚   β”‚
β”‚  β”‚ - customer_id range: 1001-2000                      β”‚   β”‚
β”‚  β”‚ - order_date range: 2024-02-01 to 2024-02-29        β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚  ...                                                        β”‚
β”‚                                                             β”‚
β”‚  Pruning: Skip partitions where query filters don't match  β”‚
β”‚  Query: WHERE customer_id = 500 AND order_date = '2024-01-15'β”‚
β”‚  Only scans Micro-Partition 1 (metadata check)            β”‚
β”‚                                                             β”‚
└─────────────────────────────────────────────────────

Clustering

Clustering organizes micro-partitions by column values for better pruning.

Architecture Diagram
Without Clustering:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Micro-Partition 1: customer_id [1-1000], date [Jan]        β”‚
β”‚ Micro-Partition 2: customer_id [500-1500], date [Feb]      β”‚
β”‚ Micro-Partition 3: customer_id [1-500], date [Mar]         β”‚
β”‚ Micro-Partition 4: customer_id [1000-2000], date [Jan]     β”‚
β”‚                                                             β”‚
β”‚ Query: WHERE customer_id = 500                              β”‚
β”‚ Must scan ALL 4 partitions (no pruning)                    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

With Clustering on customer_id:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Micro-Partition 1: customer_id [1-500], date [Jan, Mar]    β”‚
β”‚ Micro-Partition 2: customer_id [500-1000], date [Jan, Feb] β”‚
β”‚ Micro-Partition 3: customer_id [1000-1500], date [Feb]     β”‚
β”‚ Micro-Partition 4: customer_id [1500-2000], date [Jan]     β”‚
β”‚                                                             β”‚
β”‚ Query: WHERE customer_id = 500                              β”‚
β”‚ Only scans Micro-Partition 2 (effective pruning)           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Clustering metrics:

Depth=NumberΒ ofΒ micro-partitionsNumberΒ ofΒ distinctΒ clusteringΒ keyΒ values\text{Depth} = \frac{\text{Number of micro-partitions}}{\text{Number of distinct clustering key values}}
Overlap=PartitionsΒ withΒ overlappingΒ rangesTotalΒ partitions\text{Overlap} = \frac{\text{Partitions with overlapping ranges}}{\text{Total partitions}}

Target: Depth close to 1, Overlap close to 0.

Time Travel

Snowflake retains historical data for a configurable period.

Architecture Diagram
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Time Travel                              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                             β”‚
β”‚  Current State:                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ orders: {id: 1, amount: 100}, {id: 2, amount: 200}  β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                             β”‚
β”‚  After UPDATE:                                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ Current: {id: 1, amount: 150}, {id: 2, amount: 200} β”‚   β”‚
β”‚  β”‚ Historical (1 hour ago): {id: 1, amount: 100}, ...  β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                             β”‚
β”‚  Query Historical:                                          β”‚
β”‚  SELECT * FROM orders AT (OFFSET => -3600*24);  -- 24 hours agoβ”‚
β”‚  SELECT * FROM orders AT (TIMESTAMP => '2024-01-15 10:00:00');β”‚
β”‚  SELECT * FROM orders BEFORE (STATEMENT => 'abc123');       β”‚
β”‚                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Time Travel retention:

  • Standard: 1 day (default)
  • Enterprise: 90 days
  • Business Critical: 90 days
  • Custom: Up to 90 days

Zero-Copy Cloning

Create a clone without copying data (metadata operation only).

Architecture Diagram
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Zero-Copy Cloning                              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                             β”‚
β”‚  Original Table: orders (5TB)                               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ Micro-partitions (shared)                            β”‚   β”‚
β”‚  β”‚ Metadata: {table: orders, rows: 1B}                 β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                             β”‚
β”‚  Clone: orders_clone (0 bytes additional storage)          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ Micro-partitions (SHARED with original)              β”‚   β”‚
β”‚  β”‚ Metadata: {table: orders_clone, rows: 1B}           β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                             β”‚
β”‚  After DML on clone:                                        β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ Original: Unchanged                                 β”‚   β”‚
β”‚  β”‚ Clone: New micro-partitions for changed rows        β”‚   β”‚
β”‚  β”‚ (Copy-on-Write)                                     β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Snowflake Warehouse Sizes

SizevCPUsMemoryCost/Hour
X-Small12GB$0.00/credit
Small24GB1 credit
Medium48GB2 credits
Large816GB4 credits
X-Large1632GB8 credits
2X-Large3264GB16 credits
3X-Large64128GB32 credits
4X-Large128256GB64 credits

Code Implementation

Clustering Strategy

-- ============================================================
-- CLUSTERING STRATEGY
-- ============================================================

-- 1. Analyze current clustering
SELECT SYSTEM$CLUSTERING_DEPTH('orders');
SELECT SYSTEM$CLUSTERING_INFORMATION('orders');

-- 2. Create clustering key
ALTER TABLE orders CLUSTER BY (customer_id, order_date);

-- 3. Recluster table
ALTER TABLE orders RECLUSTER;

-- 4. Monitor clustering depth
SELECT 
    partition_id,
    SYSTEM$CLUSTERING_DEPTH('orders', partition_id) as depth,
    SYSTEM$CLUSTERING_INFORMATION('orders', partition_id) as info
FROM (
    SELECT partition_id 
    FROM TABLE(INFORMATION_SCHEMA.PARTITIONS('orders'))
);

-- 5. Automatic clustering (recommended)
-- Snowflake automatically maintains clustering
-- Monitor with:
SELECT * FROM TABLE(INFORMATION_SCHEMA.TABLE_STORAGE_METRICS('orders'));

-- 6. Clustering for specific query patterns
-- For queries filtering by customer_id and date range
ALTER TABLE orders CLUSTER BY (customer_id, DATE(order_date));

-- 7. Multi-level clustering
-- For queries that filter by customer_id first, then date
ALTER TABLE orders CLUSTER BY (customer_id, order_date, status);

Time Travel Configuration

-- ============================================================
-- TIME TRAVEL CONFIGURATION
-- ============================================================

-- 1. Set time travel retention
ALTER TABLE orders SET DATA_RETENTION_TIME_IN_DAYS = 7;

-- 2. Query historical data
-- By offset (seconds ago)
SELECT * FROM orders AT (OFFSET => -3600*24);  -- 24 hours ago

-- By timestamp
SELECT * FROM orders AT (TIMESTAMP => '2024-01-15 10:00:00');

-- By statement ID
SELECT * FROM orders BEFORE (STATEMENT => 'abc123');

-- 3. Create table at specific point in time
CREATE TABLE orders_backup 
    CLONE orders 
    AT (TIMESTAMP => '2024-01-15 10:00:00');

-- 4. Time travel for schema changes
-- See what table looked like before a column was added
SELECT * FROM orders AT (OFFSET => -3600) 
    COLUMNS(* EXCLUDE new_column);

-- 5. Restore from time travel
CREATE OR REPLACE TABLE orders 
    CLONE orders AT (TIMESTAMP => '2024-01-15 00:00:00');

-- 6. Monitor time travel storage
SELECT 
    table_name,
    time_travel_bytes,
    time_travel_bytes / 1024 / 1024 / 1024 AS time_travel_gb
FROM TABLE(INFORMATION_SCHEMA.TABLE_STORAGE_METRICS('orders'));

Zero-Copy Cloning

-- ============================================================
-- ZERO-COPY CLONING
-- ============================================================

-- 1. Create clone (instant, no storage cost)
CREATE TABLE orders_clone CLONE orders;

-- 2. Clone with data modification
CREATE TABLE orders_test AS 
SELECT * FROM orders 
WHERE order_date >= '2024-01-01';

-- 3. Clone for testing
CREATE TABLE orders_staging CLONE orders;

-- Perform tests on clone
UPDATE orders_staging SET amount = amount * 1.1 WHERE status = 'pending';

-- Verify results
SELECT * FROM orders_staging WHERE amount != orders.amount;

-- Drop clone when done (no cost)
DROP TABLE orders_staging;

-- 4. Clone for development
CREATE TABLE dev_orders CLONE production.orders;

-- 5. Clone for backup
CREATE TABLE orders_backup_20240115 
    CLONE orders 
    AT (TIMESTAMP => '2024-01-15 00:00:00');

-- 6. Monitor clone storage
SELECT 
    table_name,
    clone_group_id,
    is_clone,
    table_bytes,
    time_travel_bytes
FROM TABLE(INFORMATION_SCHEMA.TABLE_STORAGE_METRICS('orders'));

Performance Optimization

-- ============================================================
-- PERFORMANCE OPTIMIZATION
-- ============================================================

-- 1. Materialized views for frequent queries
CREATE MATERIALIZED VIEW mv_daily_revenue AS
SELECT 
    order_date,
    customer_segment,
    SUM(amount) AS total_revenue,
    COUNT(*) AS order_count
FROM orders
JOIN customers ON orders.customer_id = customers.customer_id
GROUP BY order_date, customer_segment;

-- Refresh materialized view
ALTER MATERIALIZED VIEW mv_daily_revenue REFRESH;

-- 2. Search optimization for point queries
ALTER TABLE orders SET SEARCH_OPTIMIZATION = ON;

-- 3. Result caching
-- Automatic, but can force:
SELECT * FROM orders WHERE customer_id = 123;

-- 4. Warehouse sizing
-- Use larger warehouse for complex queries
ALTER WAREHOUSE analytics_wh SET WAREHOUSE_SIZE = 'X-LARGE';

-- 5. Auto-suspend and auto-resume
ALTER WAREHOUSE analytics_wh 
    SET AUTO_SUSPEND = 60 
    AUTO_RESUME = TRUE;

-- 6. Multi-cluster warehouse for concurrency
ALTER WAREHOUSE analytics_wh 
    SET MIN_CLUSTER_COUNT = 1 
    MAX_CLUSTER_COUNT = 5;

-- 7. Query profiling
SELECT * FROM TABLE(GET_QUERY_OPERATOR_STATS('query_id'));

Cost Optimization

-- ============================================================
-- COST OPTIMIZATION
-- ============================================================

-- 1. Monitor credit usage
SELECT 
    warehouse_name,
    SUM(credits_used) AS total_credits,
    SUM(credits_used) * 3 AS total_cost_usd
FROM SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY
WHERE start_date >= DATEADD(day, -30, CURRENT_DATE())
GROUP BY warehouse_name
ORDER BY total_credits DESC;

-- 2. Identify expensive queries
SELECT 
    query_id,
    query_text,
    execution_time_ms,
    partitions_scanned,
    bytes_scanned / 1024 / 1024 AS mb_scanned,
    credits_used
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE start_date >= DATEADD(day, -7, CURRENT_DATE())
ORDER BY credits_used DESC
LIMIT 10;

-- 3. Right-size warehouses
-- If average utilization < 50%, use smaller warehouse
SELECT 
    warehouse_name,
    AVG(used_clusters) AS avg_clusters,
    MAX(cluster_number) AS max_clusters
FROM SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_LOAD_HISTORY
WHERE start_date >= DATEADD(day, -7, CURRENT_DATE())
GROUP BY warehouse_name;

-- 4. Use serverless for ad-hoc queries
-- Serverless auto-scales and charges per second

-- 5. Compact micro-partitions
ALTER TABLE orders COMPACT;

-- 6. Drop unused tables
SELECT table_name, last_altered 
FROM INFORMATION_SCHEMA.TABLES 
WHERE last_altered < DATEADD(day, -90, CURRENT_DATE());

Data Sharing

-- ============================================================
-- DATA SHARING
-- ============================================================

-- 1. Create share
CREATE SHARE analytics_share;

-- 2. Grant usage on database
GRANT USAGE ON DATABASE analytics_db TO SHARE analytics_share;

-- 3. Grant usage on schema
GRANT USAGE ON SCHEMA analytics_db.public TO SHARE analytics_share;

-- 4. Grant SELECT on table
GRANT SELECT ON TABLE analytics_db.public.orders TO SHARE analytics_share;

-- 5. Add consumer account
ALTER SHARE analytics_share ADD ACCOUNTS = consumer_account_id;

-- 6. Consumer side: Create database from share
CREATE DATABASE analytics_shared FROM SHARE provider_account.analytics_share;

πŸ’‘

Production Tip: Snowflake's automatic clustering is usually sufficient. Only use manual clustering for very specific query patterns. Monitor clustering depth regularly and recluster if depth > 10.


Common Follow-Up Questions

Q1: How do you handle data skew in Snowflake?

-- Monitor skew
SELECT 
    partition_id,
    row_count,
    bytes / 1024 / 1024 AS size_mb
FROM TABLE(INFORMATION_SCHEMA.PARTITIONS('orders'))
ORDER BY row_count DESC;

-- Recluster to fix skew
ALTER TABLE orders RECLUSTER;

Q2: How do you migrate to Snowflake?

-- 1. Use Snowpipe for initial load
CREATE PIPE orders_pipe
    AUTO_INGEST = TRUE
    AS
    COPY INTO orders
    FROM @s3_stage/orders/
    FILE_FORMAT = (TYPE = PARQUET);

-- 2. Use CTAS for transformation
CREATE TABLE orders_snowflake AS
SELECT * FROM orders_external;

-- 3. Validate migration
SELECT COUNT(*) FROM orders_external;
SELECT COUNT(*) FROM orders_snowflake;

Q3: How do you handle schema evolution?

-- Snowflake handles schema evolution automatically
-- New columns are added with ALTER TABLE
ALTER TABLE orders ADD COLUMN discount DECIMAL(10,2);

-- Or use COPY with MATCH_BY_COLUMN_NAME
COPY INTO orders
FROM @s3_stage/orders/
FILE_FORMAT = (TYPE = PARQUET)
MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE;

Q4: How do you optimize for concurrent users?

-- Use multi-cluster warehouse
ALTER WAREHOUSE analytics_wh 
    SET MIN_CLUSTER_COUNT = 2 
    MAX_CLUSTER_COUNT = 10;

-- Use result cache
-- Queries with same logic return cached results

-- Use materialized views
-- Pre-compute frequent aggregations

⚠️

Critical Consideration: Snowflake costs can spiral with unlimited auto-scaling. Always set MAX_CLUSTER_COUNT and MONITOR credit usage. Use AUTO_SUSPEND to stop idle warehouses.


Company-Specific Tips

Snowflake Interview Tips

  • Discuss micro-partitions and pruning
  • Explain clustering vs partitioning
  • Mention time travel use cases
  • Talk about zero-copy cloning benefits

Databricks Interview Tips

  • Compare Snowflake vs Databricks
  • Discuss Delta Lake vs Snowflake time travel
  • Mention Photon vs Snowflake performance
  • Talk about lakehouse vs warehouse

Netflix Interview Tips

  • Discuss cost optimization with Snowflake
  • Explain multi-tenant warehouse strategies
  • Mention data sharing for partnerships
  • Talk about real-time analytics with Snowflake

ℹ️

Final Takeaway: Snowflake excels at separating storage and compute, enabling independent scaling. Use clustering for query performance, time travel for data recovery, and zero-copy cloning for testing. Always monitor costs and right-size warehouses.

Advertisement