Incremental Models
Incremental Architecture
Architecture Diagram
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β INCREMENTAL MODEL ARCHITECTURE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β INCREMENTAL STRATEGIES β β
β β β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββββββ β β
β β β MERGE β β APPEND β β DELETE+INSERT β β β
β β β β β β β β β β
β β β β’ Upsert β β β’ Add rows β β β’ Replace partitions β β β
β β β β’ Update β β β’ No updates β β β’ Full refresh β β β
β β β β’ Insert β β β’ Append onlyβ β β’ Selective replace β β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β EXECUTION FLOW β β
β β β β
β β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β β
β β β CHECK βββββΆβ BUILD βββββΆβ MERGE βββββΆβ VERIFY β β β
β β β INCR. β β NEW β β DATA β β RESULTS β β β
β β β MODE β β DATA β β β β β β β
β β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Merge Strategy Flow
Architecture Diagram
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MERGE STRATEGY EXECUTION β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β EXISTING TABLE NEW DATA β
β βββββββββββββββββββ βββββββββββββββββββ β
β β order_id β statusβ β order_id β statusβ β
β ββββββββββββΌββββββββ ββββββββββββΌββββββββ β
β β 1 β activeβ β 1 β shippedβ β
β β 2 β pendingβ β 3 β activeβ β
β βββββββββββββββββββ βββββββββββββββββββ β
β β β β
β βββββββββββββββββ¬ββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β MERGE OPERATION β β
β β β β
β β 1. MATCH by order_id β β
β β 2. UPDATE existing rows (order_id = 1: status β shipped) β β
β β 3. INSERT new rows (order_id = 3: new record) β β
β β 4. DELETE soft-deleted rows (if configured) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β RESULT TABLE β
β βββββββββββββββββββ β
β β order_id β statusβ β
β ββββββββββββΌββββββββ β
β β 1 β shippedβ β UPDATED β
β β 2 β pendingβ β UNCHANGED β
β β 3 β activeβ β INSERTED β
β βββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Partition Strategy
Architecture Diagram
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PARTITION MANAGEMENT β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β PARTITION STRATEGIES β β
β β β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββββββ β β
β β β DAY β β HOUR β β MONTH β β β
β β β β β β β β β β
β β β Granularity β β High β β Low β β β
β β β Medium β β Granularity β β Granularity β β β
β β β β β β β β β β
β β β 365 β β 8760 β β 12 β β β
β β β partitions/yrβ β partitions/yrβ β partitions/yr β β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β INCREMENTAL PARTITION PRUNING β β
β β β β
β β βββββββ¬ββββββ¬ββββββ¬ββββββ¬ββββββ¬ββββββ¬ββββββ¬ββββββ β β
β β β Jan β Feb β Mar β Apr β May β Jun β Jul β Aug β β β
β β β β β β β β β β β β β β β β β β β β β
β β βββββββ΄ββββββ΄ββββββ΄ββββββ΄ββββββ΄ββββββ΄ββββββ΄ββββββ β β
β β β β β β
β β βββββββββββββββ Process only new data βββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Detailed Explanation
Incremental models are one of dbt's most powerful features for optimizing performance and reducing costs. They allow you to process only new or changed data rather than rebuilding entire tables.
How Incremental Models Work
- First Run: The model executes as a full table build
- Subsequent Runs: Only new/changed data is processed
- Merge Logic: New data is merged into existing table
- Partition Pruning: Queries are optimized with partition filters
Incremental Strategies
Merge Strategy
The most common strategy for upsert operations:
merge into target using source on target.id = source.id
when matched then update set ...
when not matched then insert ...
Append Strategy
Simple append of new rows without updates:
insert into target select * from source
where source.id not in (select id from target)
Delete+Insert Strategy
Replace specific partitions or segments:
delete from target where date = '2024-01-01'
insert into target select * from source where date = '2024-01-01'
Partition Management
Partitioning is crucial for incremental performance:
- Date-based partitioning: Most common, partition by date
- Timestamp-based partitioning: Higher granularity for large datasets
- Integer-based partitioning: For numeric keys
- Clustered partitioning: Combined partitioning and clustering
Watermark Detection
dbt uses watermark detection to identify new data:
{% if is_incremental() %}
where updated_at > (select max(updated_at) from {{ this }})
{% endif %}
Performance Considerations
- Partition pruning: Always filter on partition columns
- Cluster alignment: Cluster by frequently filtered columns
- Batch processing: Process data in appropriate batch sizes
- Resource allocation: Use appropriate warehouse sizes
Code Examples
Basic Incremental Model
-- models/marts/fct_orders.sql
{{
config(
materialized='incremental',
unique_key='order_id',
incremental_strategy='merge'
)
}}
with orders as (
select * from {{ ref('stg_orders') }}
),
final as (
select
order_id,
customer_id,
order_date,
status,
amount,
current_timestamp() as updated_at
from orders
)
select * from final
{% if is_incremental() %}
where updated_at > (select max(updated_at) from {{ this }})
{% endif %}
Partitioned Incremental Model (BigQuery)
-- models/marts/fct_events.sql
{{
config(
materialized='incremental',
unique_key='event_id',
partition_by={
"field": "event_date",
"data_type": "date",
"granularity": "day"
},
cluster_by=['user_id', 'event_type'],
incremental_strategy='merge'
)
}}
with events as (
select * from {{ ref('stg_events') }}
),
final as (
select
event_id,
user_id,
event_type,
event_timestamp,
cast(event_timestamp as date) as event_date,
event_properties,
current_timestamp() as updated_at
from events
)
select * from final
{% if is_incremental() %}
where event_date >= date_sub(
(select max(event_date) from {{ this }}),
interval 3 day
)
{% endif %}
Append-Only Incremental Model
-- models/marts/fct_events_append.sql
{{
config(
materialized='incremental',
incremental_strategy='append',
unique_key='event_id'
)
}}
with events as (
select
event_id,
user_id,
event_type,
event_timestamp,
event_properties
from {{ ref('stg_events') }}
)
select * from events
{% if is_incremental() %}
where event_id not in (select event_id from {{ this }})
{% endif %}
Delete+Insert Strategy
-- models/marts/fct_daily_metrics.sql
{{
config(
materialized='incremental',
incremental_strategy='delete+insert',
unique_key=['metric_date', 'metric_name']
)
}}
with daily_metrics as (
select
cast(metric_timestamp as date) as metric_date,
metric_name,
metric_value,
current_timestamp() as updated_at
from {{ ref('stg_metrics') }}
)
select * from daily_metrics
{% if is_incremental() %}
where metric_date >= date_sub(
(select max(metric_date) from {{ this }}),
interval 7 day
)
{% endif %}
Advanced Incremental with Macro
-- models/marts/fct_orders_advanced.sql
{{
config(
materialized='incremental',
unique_key='order_id',
incremental_strategy='merge',
partition_by={
"field": "order_date",
"data_type": "date",
"granularity": "day"
},
cluster_by=['customer_id', 'order_status']
)
}}
{% macro get_incremental_filter() %}
{% if is_incremental() %}
where order_date >= date_sub(
(select max(order_date) from {{ this }}),
interval {{ var('incremental_lookback_days', 7) }} day
)
{% endif %}
{% endmacro %}
with orders as (
select * from {{ ref('stg_orders') }}
),
customers as (
select * from {{ ref('dim_customers') }}
),
final as (
select
orders.order_id,
orders.customer_id,
customers.customer_name,
customers.segment,
orders.order_date,
orders.status as order_status,
orders.amount,
current_timestamp() as updated_at
from orders
left join customers on orders.customer_id = customers.customer_id
{{ get_incremental_filter() }}
)
select * from final
Performance Metrics
| Metric | Full Refresh | Incremental | Improvement |
|---|---|---|---|
| Processing Time | 100% | 10-30% | 70-90% |
| Cost | 100% | 20-40% | 60-80% |
| Data Scanned | 100% | 5-20% | 80-95% |
| Warehouse Size | Large | Medium | 50% reduction |
| Parallelism | Low | High | 2-3x improvement |
Best Practices
- Always use unique_key for merge strategies
- Partition large tables for efficient pruning
- Cluster by frequently filtered columns
- Use appropriate incremental lookback for late-arriving data
- Test incremental logic with both full and incremental runs
- Monitor partition counts to avoid excessive partitions
- Use delete+insert for simple date-based replacements
- Consider resource costs when choosing strategies