Materializations in dbt
Materialization Architecture
Architecture Diagram
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DBT MATERIALIZATION TYPES β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β MATERIALIZATION STRATEGIES β β
β β β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββββββ β β
β β β TABLE β β VIEW β β INCREMENTAL β β β
β β β β β β β β β β
β β β β’ Full copy β β β’ Logical β β β’ Partial refresh β β β
β β β β’ Fast query β β β’ No storage β β β’ Merge/append β β β
β β β β’ Expensive β β β’ Slow query β β β’ Cost effective β β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββββββ β β
β β β β
β β ββββββββββββββββ ββββββββββββββββ β β
β β β EPHEMERAL β β MATERIALIZED β β β
β β β β β VIEW β β β
β β β β’ CTE only β β β’ Physical β β β
β β β β’ No persist β β β’ Precomputedβ β β
β β β β’ Inline β β β’ Refresh β β β
β β ββββββββββββββββ ββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Table vs View Performance
Architecture Diagram
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TABLE vs VIEW COMPARISON β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β TABLE MATERIALIZATION β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Query: SELECT * FROM fct_orders WHERE order_date = '2024-01-01' β β
β β β β
β β Execution Plan: β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β 1. Full table scan: 100M rows β β β
β β β 2. Filter: 50K rows matched β β β
β β β 3. Return results β β β
β β β β β β
β β β Time: 2.3 seconds β β β
β β β Cost: $0.05 β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β VIEW MATERIALIZATION β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Query: SELECT * FROM v_fct_orders WHERE order_date = '2024-01-01' β β
β β β β
β β Execution Plan: β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β 1. View expansion β β β
β β β 2. Full table scan: 100M rows β β β
β β β 3. Filter: 50K rows matched β β β
β β β 4. Return results β β β
β β β β β β
β β β Time: 3.1 seconds β β β
β β β Cost: $0.07 β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Ephemeral Materialization
Architecture Diagram
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β EPHEMERAL MATERIALIZATION FLOW β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β MODEL DEFINITION β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β -- models/intermediate/int_orders.sql β β
β β {{ config(materialized='ephemeral') }} β β
β β β β
β β with orders as ( β β
β β select * from {{ ref('stg_orders') }} β β
β β ) β β
β β select * from orders where status = 'active' β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β DOWNSTREAM MODEL β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β -- models/marts/fct_active_orders.sql β β
β β select * from {{ ref('int_orders') }} β β
β β β β
β β COMPILED OUTPUT: β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β select * from ( β β β
β β β with orders as ( β β β
β β β select * from schema.stg_orders β β β
β β β ) β β β
β β β select * from orders where status = 'active' β β β
β β β ) as int_orders β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Detailed Explanation
Materializations determine how dbt persists your models in the data warehouse. Each materialization strategy has different trade-offs for performance, cost, and maintenance.
Table Materialization
Tables are physical copies of your data stored in the warehouse:
Characteristics:
- Full data copy stored physically
- Fast query performance (pre-computed)
- Expensive to refresh (full rebuild)
- Independent of source data changes
Use Cases:
- Large analytical queries
- Complex aggregations
- Performance-critical dashboards
- Data that changes infrequently
View Materialization
Views are logical definitions that query underlying data:
Characteristics:
- No physical storage (logical only)
- Always reflects current source data
- Slower query performance (on-the-fly computation)
- Zero refresh cost
Use Cases:
- Simple transformations
- Real-time data access
- Small datasets
- Development/testing environments
Incremental Materialization
Incremental models process only new/changed data:
Characteristics:
- Partial data refresh
- Fast incremental updates
- Complex logic required
- Cost-effective for large datasets
Use Cases:
- Large fact tables
- High-volume event data
- Cost-sensitive workloads
- Real-time analytics
Ephemeral Materialization
Ephemeral models are compiled as CTEs into downstream models:
Characteristics:
- No physical storage
- Compiled into parent models
- Zero refresh cost
- Reduced query complexity
Use Cases:
- Reusable transformation logic
- Intermediate calculations
- Development/testing
- Small reference datasets
Materialized Views
Materialized views combine benefits of tables and views:
Characteristics:
- Physical storage with automatic refresh
- Query performance like tables
- Refresh cost like views
- Database-managed updates
Use Cases:
- Frequently accessed aggregations
- Real-time analytics
- Cost-sensitive performance
- Auto-refreshing datasets
Code Examples
Table Materialization
-- models/marts/dim_customers.sql
{{
config(
materialized='table',
schema='analytics',
tags=['dimension', 'core']
)
}}
with customers as (
select * from {{ ref('stg_customers') }}
),
orders as (
select * from {{ ref('stg_orders') }}
),
customer_metrics as (
select
customer_id,
count(*) as total_orders,
sum(amount) as total_revenue,
min(order_date) as first_order_date,
max(order_date) as last_order_date
from orders
group by 1
)
select
customers.customer_id,
customers.customer_name,
customers.email,
customers.segment,
coalesce(customer_metrics.total_orders, 0) as total_orders,
coalesce(customer_metrics.total_revenue, 0) as total_revenue,
customer_metrics.first_order_date,
customer_metrics.last_order_date,
current_timestamp() as updated_at
from customers
left join customer_metrics on customers.customer_id = customer_metrics.customer_id
View Materialization
-- models/staging/stg_orders.sql
{{
config(
materialized='view',
schema='staging'
)
}}
with source as (
select * from {{ source('raw', 'orders') }}
),
renamed as (
select
id as order_id,
customer_id,
status,
created_at as order_date,
updated_at as modified_date,
amount
from source
)
select * from renamed
Incremental Materialization
-- models/marts/fct_orders.sql
{{
config(
materialized='incremental',
unique_key='order_id',
incremental_strategy='merge',
partition_by={
"field": "order_date",
"data_type": "date"
},
cluster_by=['customer_id']
)
}}
with orders as (
select * from {{ ref('stg_orders') }}
),
final as (
select
order_id,
customer_id,
order_date,
status,
amount,
current_timestamp() as updated_at
from orders
)
select * from final
{% if is_incremental() %}
where updated_at > (select max(updated_at) from {{ this }})
{% endif %}
Ephemeral Materialization
-- models/intermediate/int_order_metrics.sql
{{
config(
materialized='ephemeral'
)
}}
with orders as (
select * from {{ ref('stg_orders') }}
),
metrics as (
select
customer_id,
count(*) as order_count,
sum(amount) as total_amount,
avg(amount) as avg_amount
from orders
group by 1
)
select * from metrics
Materialized View (Snowflake)
-- models/marts/fct_daily_metrics.sql
{{
config(
materialized='materialized_view',
schema='analytics',
auto_refresh=True,
refresh_interval='1 hour'
)
}}
with daily_orders as (
select
cast(order_date as date) as metric_date,
count(*) as order_count,
sum(amount) as total_revenue,
avg(amount) as avg_order_value
from {{ ref('stg_orders') }}
group by 1
)
select * from daily_orders
Performance Metrics
| Materialization | Query Speed | Refresh Cost | Storage | Use Case |
|---|---|---|---|---|
| Table | Excellent | High | High | Analytics |
| View | Good | None | None | Real-time |
| Incremental | Excellent | Low | Medium | Large data |
| Ephemeral | Varies | None | None | Reusable |
| Materialized View | Excellent | Low | Medium | Aggregations |
Best Practices
- Use views for staging - Always use views for 1:1 source mappings
- Use tables for marts - Final analytical datasets should be tables
- Use incremental for large facts - Cost optimization for large tables
- Use ephemeral for reuse - Shared transformation logic
- Consider materialized views - For frequently accessed aggregations
- Test materializations - Verify logic works with each type
- Monitor performance - Track query times and refresh costs
- Document decisions - Explain why each materialization was chosen