Data Warehouse: Synapse Dedicated Pools & CTAS

Enterprise data warehousing with Synapse dedicated pools, CTAS patterns, and star schema design

Data Warehouse Architecture

Architecture Diagram

┌─────────────────────────────────────────────────────────────────────┐
│                    SYNAPSE DEDICATED POOL ARCHITECTURE               │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │                    SYNAPSE WORKSPACE                          │   │
│  │                                                               │   │
│  │  DEDICATED SQL POOL (DW1000c)                                │   │
│  │  ┌─────────────────────────────────────────────────────┐    │   │
│  │  │                                                       │    │   │
│  │  │  CONTROL NODE         COMPUTE NODES (6)              │    │   │
│  │  │  ┌─────────────┐    ┌─────────────────────────────┐ │    │   │
│  │  │  │ Query       │    │ Node 1 │ Node 2 │ ... │ Node 6│ │    │   │
│  │  │  │ Processing  │───>│        │       │     │       │ │    │   │
│  │  │  │ & Merging   │    │ 2TB    │ 2TB    │     │ 2TB   │ │    │   │
│  │  │  └─────────────┘    └─────────────────────────────┘ │    │   │
│  │  │                                                       │    │   │
│  │  │  DISTRIBUTIONS (per compute node):                    │    │   │
│  │  │  ┌─────────────────────────────────────────────┐    │    │   │
│  │  │  │ Dist 1 │ Dist 2 │ Dist 3 │ ... │ Dist 60    │    │    │   │
│  │  │  └─────────────────────────────────────────────┘    │    │   │
│  │  └─────────────────────────────────────────────────────┘    │   │
│  │                                                               │   │
│  │  STAR SCHEMA DESIGN:                                          │   │
│  │  ┌─────────────────────────────────────────────────────┐    │   │
│  │  │                    dim_dates                          │    │   │
│  │  │  ┌─────────────────────────────────────────────┐    │    │   │
│  │  │  │ date_key │ full_date │ year │ month │ day    │    │    │   │
│  │  │  └─────────────────────────────────────────────┘    │    │   │
│  │  │         │                    │                         │    │   │
│  │  │         │                    │                         │    │   │
│  │  │  ┌──────▼───────┐    ┌──────▼───────┐               │    │   │
│  │  │  │ dim_customers│    │ dim_products │               │    │   │
│  │  │  │              │    │              │               │    │   │
│  │  │  │ customer_key │    │ product_key  │               │    │   │
│  │  │  │ name         │    │ name         │               │    │   │
│  │  │  │ segment      │    │ category     │               │    │   │
│  │  │  └──────┬───────┘    └──────┬───────┘               │    │   │
│  │  │         │                    │                         │    │   │
│  │  │         └────────┬───────────┘                         │    │   │
│  │  │                  │                                     │    │   │
│  │  │         ┌────────▼────────┐                           │    │   │
│  │  │         │   fact_sales    │                           │    │   │
│  │  │         │                 │                           │    │   │
│  │  │         │ sale_key        │                           │    │   │
│  │  │         │ date_key (FK)   │                           │    │   │
│  │  │         │ customer_key(FK)│                           │    │   │
│  │  │         │ product_key(FK) │                           │    │   │
│  │  │         │ quantity        │                           │    │   │
│  │  │         │ total_amount    │                           │    │   │
│  │  │         └─────────────────┘                           │    │   │
│  │  └─────────────────────────────────────────────────────┘    │   │
│  └─────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────┘

CTAS Pattern Examples

-- Create fact table with CTAS
CREATE TABLE [dbo].[FactSales]
WITH
(
    DISTRIBUTION = HASH(date_key),
    CLUSTERED COLUMNSTORE INDEX,
    PARTITION = (date_key RANGE RIGHT FOR VALUES
        (20240101, 20240201, 20240301, 20240401,
         20240501, 20240601, 20240701, 20240801,
         20240901, 20241001, 20241101, 20241201))
)
AS
SELECT
    s.sale_id,
    d.date_key,
    c.customer_key,
    p.product_key,
    s.quantity,
    s.unit_price,
    s.quantity * s.unit_price AS total_amount,
    s.region,
    GETDATE() AS load_date
FROM [staging].[Sales] s
INNER JOIN [dbo].[DimDates] d
    ON s.sale_date = d.full_date
INNER JOIN [dbo].[DimCustomers] c
    ON s.customer_id = c.customer_id
INNER JOIN [dbo].[DimProducts] p
    ON s.product_id = p.product_id;

-- Create dimension table with replicated distribution
CREATE TABLE [dbo].[DimCustomers]
WITH
(
    DISTRIBUTION = REPLICATE,
    CLUSTERED INDEX (customer_key)
)
AS
SELECT
    customer_id AS customer_key,
    customer_name,
    email,
    segment,
    region,
    GETDATE() AS load_date
FROM [staging].[Customers];

-- Create aggregate table
CREATE TABLE [dbo].[AggDailySalesByRegion]
WITH
(
    DISTRIBUTION = HASH(region),
    CLUSTERED COLUMNSTORE INDEX
)
AS
SELECT
    date_key,
    region,
    SUM(total_amount) AS daily_revenue,
    COUNT(*) AS transaction_count,
    COUNT(DISTINCT customer_key) AS unique_customers
FROM [dbo].[FactSales]
GROUP BY date_key, region;

Distribution Strategy Guide

Table Type	Recommended Distribution	Reason
Large fact tables (>1GB)	HASH (join column)	Even distribution, efficient joins
Small dimension tables (<2GB)	REPLICATE	Avoid shuffling for joins
Staging tables	ROUND ROBIN	Fast load, no skew
Aggregate tables	HASH (group by column)	Even distribution

ℹ️

Pro Tip: Use DBCC PDW_SHOWSPACEUSED('table_name') to check distribution skew. If any distribution is significantly larger than others, consider changing the distribution key.

Interview Questions

Q1: How do you choose between Hash and Round Robin distribution? A: Hash for tables that are frequently joined (even distribution on join key). Round Robin for staging tables or when no clear join key exists (fastest load, even distribution).

Q2: What is the benefit of Clustered Columnstore Indexes in Synapse? A: CCI provides optimal compression (up to 10x), columnar storage for analytics, and batch-mode execution. It's the default and recommended index type for fact tables.

Q3: How do you handle slowly changing dimensions in Synapse? A: Use MERGE statement for SCD Type 1 (overwrite). For SCD Type 2, create history tables with effective dates and status flags. Use Synapse Serverless for staging changes before merging.