dbt SCD Type 2

SCD Type 2 Architecture

SCD Type 2 Pipeline

Formal Definitions

Detailed Explanation

What is SCD Type 2?

SCD Type 2 is the gold standard for dimension history in dimensional modeling. Unlike Type 1 (overwrite) or Type 3 (limited history), Type 2 creates separate rows for each version, enabling point-in-time analysis with complete accuracy.

dbt Snapshot Approach

dbt provides built-in SCD Type 2 support through the snapshot feature. Snapshots use the check or timestamp strategy to detect changes and automatically manage the dbt_valid_from, dbt_valid_to, dbt_scd_id, and dbt_updated_at columns.

Snapshot Strategies

Strategy	How It Works	Best For
Check	Compares all/specified columns	Reliable change detection
Timestamp	Monitors a source timestamp column	Simple, timestamp-based sources

Key Takeaway: Use the check strategy with check_cols: all for reliable change detection. Enable invalidate_hard_deletes: true to handle deleted source records.

Code Examples

Basic Snapshot Configuration

# snapshots/customers.yml
snapshots:
  - name: scd_customers
    strategy: check
    unique_key: customer_id
    check_cols:
      - customer_name
      - email
      - segment
      - address
    config:
      target_schema: snapshots
      tags: ['scd', 'dimensions']
      invalidate_hard_deletes: true

Snapshot SQL File

-- snapshots/scd_customers.sql
-- dbt snapshot config: scd_customers

-- config(
--     target_schema='snapshots',
--     unique_key='customer_id',
--     strategy='check',
--     check_cols=['customer_name', 'email', 'segment', 'address'],
--     invalidate_hard_deletes=true
-- )

select * from 'erp.customers'

-- endsnapshot

Custom SCD Type 2 with Incremental Model

-- models/marts/dim_customers_scd2.sql
-- dbt config: materialized='incremental', unique_key='surrogate_key',
-- incremental_strategy='merge', tags=['dimensions', 'scd2']

with source_customers as (
    select
        customer_id,
        customer_name,
        email,
        segment,
        address,
        updated_at as source_updated_at
    from 'erp.customers'
),

-- if is_incremental():
current_versions as (
    select
        surrogate_key,
        customer_id,
        customer_name,
        email,
        segment,
        address,
        valid_from,
        valid_to,
        is_current
    from 'this_model'
    where is_current = true
),

changed_records as (
    select
        s.customer_id,
        s.customer_name,
        s.email,
        s.segment,
        s.address,
        s.source_updated_at
    from source_customers s
    left join current_versions c
        on s.customer_id = c.customer_id
    where c.customer_id is null  -- new records
       or 'surrogate_key' != 'surrogate_key'
),

expired_records as (
    update 'this_model'
    set
        valid_to = current_timestamp(),
        is_current = false
    where customer_id in (select customer_id from changed_records)
      and is_current = true
    returning *
),

new_versions as (
    select
        'surrogate_key' as surrogate_key,
        customer_id,
        customer_name,
        email,
        segment,
        address,
        source_updated_at as valid_from,
        cast(null as timestamp) as valid_to,
        true as is_current
    from changed_records
),

combined as (
    select * from expired_records
    union all
    select * from new_versions
)
select * from combined

-- else:
initial_versions as (
    select
        'surrogate_key' as surrogate_key,
        customer_id,
        customer_name,
        email,
        segment,
        address,
        source_updated_at as valid_from,
        cast(null as timestamp) as valid_to,
        true as is_current
    from source_customers
)
select * from initial_versions
-- endif

SCD Type 2 with Hard Delete Detection

-- models/marts/dim_products_scd2.sql
-- dbt config: materialized='incremental', unique_key='surrogate_key',
-- incremental_strategy='merge'

with source_products as (
    select
        product_id,
        product_name,
        category,
        price,
        is_active
    from 'erp.products'
),

-- if is_incremental():
active_versions as (
    select *
    from 'this_model'
    where is_current = true
),

-- Detect inserts and updates
changed_products as (
    select
        s.product_id,
        s.product_name,
        s.category,
        s.price,
        s.is_active
    from source_products s
    left join active_versions a on s.product_id = a.product_id
    where a.product_id is null
       or 'surrogate_key' != 'surrogate_key'
),

-- Detect hard deletes
deleted_products as (
    select a.product_id
    from active_versions a
    left join source_products s on a.product_id = s.product_id
    where s.product_id is null
),

expired_updates as (
    update 'this_model'
    set
        valid_to = current_timestamp(),
        is_current = false
    where product_id in (select product_id from changed_products)
      and is_current = true
    returning *
),

expired_deletes as (
    update 'this_model'
    set
        valid_to = current_timestamp(),
        is_current = false,
        is_deleted = true
    where product_id in (select product_id from deleted_products)
      and is_current = true
    returning *
),

new_records as (
    select
        'surrogate_key' as surrogate_key,
        product_id,
        product_name,
        category,
        price,
        is_active,
        current_timestamp() as valid_from,
        cast(null as timestamp) as valid_to,
        true as is_current,
        false as is_deleted
    from changed_products
),

combined as (
    select * from expired_updates
    union all
    select * from expired_deletes
    union all
    select * from new_records
)
select * from combined

-- else:
initial_records as (
    select
        'surrogate_key' as surrogate_key,
        product_id,
        product_name,
        category,
        price,
        is_active,
        current_timestamp() as valid_from,
        cast(null as timestamp) as valid_to,
        true as is_current,
        false as is_deleted
    from source_products
)
select * from initial_records
-- endif

Querying SCD Type 2 Tables

-- Get current version of all records
select * from 'dim_customers_scd2'
where is_current = true

-- Point-in-time lookup
select *
from 'dim_customers_scd2'
where valid_from <= '2024-06-01'
  and (valid_to > '2024-06-01' OR valid_to IS NULL)

-- Get all versions of a specific customer
select *
from 'dim_customers_scd2'
where customer_id = 'CUST-001'
order by valid_from

-- Find records that changed in a specific period
select *
from 'dim_customers_scd2'
where valid_from >= '2024-01-01'
  and valid_from < '2024-02-01'

Comparison: SCD Strategies

Strategy	History	Storage	Query Complexity	Use Case
Type 1	None (overwrite)	Minimal	Simple	Current state only
Type 2	Full (new rows)	High	Moderate	Audit and history
Type 3	Limited (one prev)	Low	Moderate	Previous value only
Type 4	Full (history table)	High	High	Separate history table
Type 6	Hybrid (flag + history)	High	High	Current + history

Performance Metrics

Metric	Description	Target
Snapshot Duration	Time to detect changes	< 10 minutes
Record Growth Rate	New versions per run	Monitor for anomalies
Query Performance	Point-in-time lookup	< 2 seconds
Storage Efficiency	Row count vs unique keys	< 10x growth
Change Detection Accuracy	Missed changes	0%

Best Practices

Use check strategy - More reliable than timestamp for detecting changes
Track all important columns - Include all columns needed for analysis
Handle hard deletes - Enable invalidate_hard_deletes for source deletions
Index surrogate_key - Ensure efficient joins with fact tables
Partition by valid_from - Optimize historical queries
Create current views - Materialized views for is_current = true
Monitor row growth - Alert on unexpected version proliferation
Document grain - Clearly state the SCD strategy in model descriptions

dbt SCD Type 2

dbt SCD Type 2

SCD Type 2 Architecture

SCD Type 2 Pipeline

Formal Definitions

Detailed Explanation

What is SCD Type 2?

dbt Snapshot Approach

Snapshot Strategies

Code Examples

Basic Snapshot Configuration

Snapshot SQL File

Custom SCD Type 2 with Incremental Model

SCD Type 2 with Hard Delete Detection

Querying SCD Type 2 Tables

Comparison: SCD Strategies

Performance Metrics

Best Practices

See Also

Need Expert dbt Help?