πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Data Warehouse on AWS

AWS Data EngineeringRedshift Schema Design & Distribution⭐ Premium

Advertisement

πŸ“Š Data Warehouse on AWS

Master Redshift schema design, distribution strategies, and data warehouse best practices.

Module: AWS Data Engineering β€’ Topic 21 of 65 β€’ Premium Content

Star Schema Design

Architecture Diagram
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    STAR SCHEMA DESIGN                                         β”‚
β”‚                                                                             β”‚
β”‚                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                      β”‚
β”‚                    β”‚  dim_date       β”‚                                      β”‚
β”‚                    │─────────────────│                                      β”‚
β”‚                    β”‚ date_key (PK)   β”‚                                      β”‚
β”‚                    β”‚ full_date       β”‚                                      β”‚
β”‚                    β”‚ year            β”‚                                      β”‚
β”‚                    β”‚ quarter         β”‚                                      β”‚
β”‚                    β”‚ month           β”‚                                      β”‚
β”‚                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                      β”‚
β”‚                             β”‚                                               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”‚       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                     β”‚
β”‚  β”‚  dim_customer   β”‚       β”‚       β”‚  dim_product    β”‚                     β”‚
β”‚  │─────────────────│       β”‚       │─────────────────│                     β”‚
β”‚  β”‚ customer_key(PK)β”‚       β”‚       β”‚ product_key(PK) β”‚                     β”‚
β”‚  β”‚ customer_id     β”‚       β”‚       β”‚ product_id      β”‚                     β”‚
β”‚  β”‚ name            β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€ name            β”‚                     β”‚
β”‚  β”‚ segment         β”‚       β”‚       β”‚ category        β”‚                     β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β”‚       β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜                     β”‚
β”‚                            β”‚                β”‚                               β”‚
β”‚                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”                      β”‚
β”‚                    β”‚        fact_sales               β”‚                      β”‚
β”‚                    │─────────────────────────────────│                      β”‚
β”‚                    β”‚ sale_key (PK)                   β”‚                      β”‚
β”‚                    β”‚ date_key (FK)                   β”‚                      β”‚
β”‚                    β”‚ customer_key (FK)               β”‚                      β”‚
β”‚                    β”‚ product_key (FK)                β”‚                      β”‚
β”‚                    β”‚ quantity                        β”‚                      β”‚
β”‚                    β”‚ amount                          β”‚                      β”‚
β”‚                    β”‚ discount                        β”‚                      β”‚
β”‚                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Distribution Strategies

StrategyBest ForPerformanceExample
KEYLarge fact tables joined on columnBest join performancefact_sales DISTKEY(customer_id)
EVENEven distribution without joinsGood for stagingstaging_data DISTSTYLE EVEN
ALLSmall dimension tables (<2GB)Fastest scansdim_products DISTSTYLE ALL
AUTOUnknown patternsAdaptiveMost tables

Sort Keys

-- Compound sort for time-series queries
CREATE TABLE fact_sales (
    sale_id BIGINT IDENTITY(1,1),
    date_key INT DISTKEY,
    customer_key INT,
    product_key INT,
    quantity INT,
    amount DECIMAL(10,2)
)
DISTSTYLE KEY
COMPOUND SORTKEY(date_key, customer_key);

-- Interleaved for multi-column queries
CREATE TABLE dim_customer (
    customer_key INT IDENTITY(1,1),
    customer_id VARCHAR(50),
    name VARCHAR(100),
    segment VARCHAR(50),
    region VARCHAR(50)
)
DISTSTYLE ALL
INTERLEAVED SORTKEY(customer_id, segment, region);

Interview Q&A

Q1: Star vs Snowflake schema?

Answer: Star schema has denormalized dimensions (faster queries). Snowflake normalizes dimensions (less storage). Use star for data warehouses.

Q2: How to choose a distribution key?

Answer: Choose the column most frequently used in JOINs. Large tables benefit most from KEY distribution on join columns.

Q3: What is data skew and how to handle it?

Answer: Uneven data distribution across nodes. Handle by choosing better distribution keys, using EVEN for staging, or salting keys.

Summary

  • Star Schema: Denormalized dimensions, faster queries
  • Distribution: KEY for facts, ALL for small dimensions, AUTO for most
  • Sort Keys: Compound for range queries, Interleaved for multi-column
  • Performance: VACUUM, ANALYZE, and proper indexing

Advertisement