CW

dbt Ephemeral Models

Free Lesson

Advertisement

dbt Ephemeral Models

Ephemeral Model Architecture

Materialization Comparison Pipeline

Formal Definitions

DfEphemeral Model

An ephemeral model is a dbt model that is not materialized as a database object. Instead, its SQL is collected and injected as a Common Table Expression (CTE) into any downstream model that references it via ref(). Ephemeral models exist only at compile time and have no persistent representation in the database. They are ideal for intermediate transformations that are used by exactly one downstream model.

DfCTE Injection

CTE injection is the process where dbt's compiler replaces ref('ephemeral_model') calls with the full SQL definition of the ephemeral model, wrapped as a CTE. For a downstream model D that references ephemeral model E, the compiled SQL becomes: WITH cte_name AS (SELECT ... FROM ...) SELECT ... FROM cte_name. This eliminates the need for a separate database object.

DfCTE (Common Table Expression)

A Common Table Expression (CTE) is a named temporary result set defined within a SQL query using the WITH clause. CTEs improve readability, enable recursive queries, and allow referencing the same subquery multiple times. In dbt, ephemeral models leverage CTEs to compose complex transformations without creating database objects.

Detailed Explanation

Ephemeral models solve a specific problem: intermediate transformations that need to be reused but shouldn't persist as database objects. They combine the reusability of views with the performance of inline SQL.

When to Use Ephemeral Models

  1. Single consumer - Model is referenced by exactly one downstream model
  2. Complex logic - Transformation is too complex to inline
  3. Shared transformations - Same logic used in multiple downstream models
  4. Performance - Avoid view overhead for simple transformations
  5. Security - Don't expose intermediate tables to end users

When NOT to Use Ephemeral Models

  1. Multiple consumers - Model is referenced by many downstream models (use view)
  2. Large datasets - CTE may cause query optimizer issues (use table)
  3. Long-running queries - CTE re-execution is wasteful (use incremental)
  4. Debugging needed - Cannot inspect ephemeral model data directly
  5. Testing required - Cannot run data tests on ephemeral models

Ephemeral models are not materialized as database objects. This means you cannot run dbt run --select ephemeral_model to create them, and you cannot query them directly. They only exist as CTEs within downstream models. If you need to inspect an ephemeral model's output, temporarily change it to a view.

Use ephemeral models for intermediate transformations that encapsulate complex business logic. This keeps downstream models clean and readable while avoiding the overhead of separate database objects. However, if an ephemeral model is referenced by more than 3 downstream models, consider converting it to a view for better performance.

Code Examples

Basic Ephemeral Model

-- models/intermediate/int_order_items_cleaned.sql
{{
    config(
        materialized='ephemeral'
    )
}}

with source as (
    select * from {{ ref('stg_order_items') }}
),

cleaned as (
    select
        order_id,
        product_id,
        quantity,
        unit_price,
        quantity * unit_price as line_total,
        case
            when quantity < 0 then 0
            else quantity
        end as clean_quantity
    from source
    where order_id is not null
      and product_id is not null
)

select * from cleaned

Downstream Model Using Ephemeral

-- models/marts/fct_orders.sql
{{
    config(
        materialized='incremental',
        unique_key='order_id'
    )
}}

with orders as (
    select * from {{ ref('stg_orders') }}
),

customers as (
    select * from {{ ref('dim_customers') }}
),

-- This references the ephemeral model
-- The CTE is injected here at compile time
order_items as (
    select * from {{ ref('int_order_items_cleaned') }}
),

order_summary as (
    select
        o.order_id,
        o.customer_id,
        c.customer_name,
        c.segment,
        o.order_date,
        sum(oi.line_total) as order_total,
        count(oi.product_id) as item_count
    from orders o
    left join customers c on o.customer_id = c.customer_id
    left join order_items oi on o.order_id = oi.order_id
    group by 1, 2, 3, 4, 5
)

select * from order_summary

Compiled Output (CTE Injection)

-- The above downstream model compiles to:
WITH
order_items AS (
    -- CTE injected from ephemeral model
    SELECT
        order_id,
        product_id,
        quantity,
        unit_price,
        quantity * unit_price AS line_total,
        CASE
            WHEN quantity < 0 THEN 0
            ELSE quantity
        END AS clean_quantity
    FROM stg_order_items
    WHERE order_id IS NOT NULL
      AND product_id IS NOT NULL
)
SELECT
    o.order_id,
    o.customer_id,
    c.customer_name,
    c.segment,
    o.order_date,
    SUM(oi.line_total) AS order_total,
    COUNT(oi.product_id) AS item_count
FROM stg_orders o
LEFT JOIN dim_customers c ON o.customer_id = c.customer_id
LEFT JOIN order_items oi ON o.order_id = oi.order_id
GROUP BY 1, 2, 3, 4, 5

Multiple Ephemeral Models

-- models/intermediate/int_dates_filtered.sql
{{
    config(materialized='ephemeral')
}}

select *
from {{ ref('dim_date') }}
where date_value >= '{{ var("start_date") }}'
  and date_value <= '{{ var("end_date") }}'
-- models/intermediate/int_revenue_calc.sql
{{
    config(materialized='ephemeral')
}}

with orders as (
    select * from {{ ref('stg_orders') }}
),

order_items as (
    select * from {{ ref('int_order_items_cleaned') }}
),

revenue as (
    select
        o.order_id,
        o.order_date,
        sum(oi.line_total) as gross_revenue,
        sum(oi.line_total * 0.1) as estimated_tax,
        sum(oi.line_total * 0.9) as net_revenue
    from orders o
    left join order_items oi on o.order_id = oi.order_id
    group by 1, 2
)

select * from revenue
-- models/marts/fct_daily_revenue.sql
{{
    config(materialized='table')
}}

with dates as (
    select * from {{ ref('int_dates_filtered') }}
),

revenue as (
    select * from {{ ref('int_revenue_calc') }}
),

daily_summary as (
    select
        d.date_value,
        coalesce(sum(r.gross_revenue), 0) as total_revenue,
        coalesce(sum(r.estimated_tax), 0) as total_tax,
        coalesce(sum(r.net_revenue), 0) as total_net_revenue,
        count(distinct r.order_id) as order_count
    from dates d
    left join revenue r on d.date_value = r.order_date
    group by 1
)

select * from daily_summary

Comparison: Materializations

MaterializationDatabase ObjectQueryableTestableUse Case
ViewViewYesYesShared transformations
EphemeralNone (CTE)NoNoSingle-consumer intermediates
TableTableYesYesFinal marts
IncrementalTableYesYesLarge, append-only datasets

Performance Characteristics

ScenarioViewEphemeralTableIncremental
Simple transformationGoodBestGoodGood
Complex transformationGoodModerateBestBest
Multiple consumersBestPoorGoodGood
Single consumerGoodBestModerateModerate
Large datasetGoodPoorGoodBest
DebuggingBestPoorBestGood

Ephemeral models can cause performance issues when the CTE is very large or complex. Some query optimizers may not optimize CTEs well, leading to repeated execution. If you notice performance degradation, consider materializing the ephemeral model as a view or table instead.

Ephemeral models cannot be tested with dbt's data tests. If you need to test an ephemeral model, create a temporary view that selects from the ephemeral model and apply tests to that view. Alternatively, test the downstream model that uses the ephemeral model.

Debugging Ephemeral Models

Option 1: Temporarily Change to View

-- Change materialization to inspect output
{{
    config(
        materialized='view'  -- Changed from 'ephemeral'
    )
}}

-- Run dbt run --select int_order_items_cleaned
-- Then query the view in your database
-- Remember to change back to ephemeral when done

Option 2: Use dbt Compile

# See the compiled SQL with CTE injection
dbt compile --select fct_orders

# Check the target/compiled directory for output
cat target/compiled/project_name/models/marts/fct_orders.sql

Option 3: Add Debug CTE

-- Add a debug CTE to see ephemeral model output
{{
    config(materialized='ephemeral')
}}

with debug as (
    -- Add this to see the output
    select count(*) as row_count from (
        -- Original ephemeral model logic
        select * from {{ ref('stg_orders') }}
    ) subq
)

select * from cleaned
-- Note: This won't work with ephemeral, use view instead

Best Practices

  1. Single consumer - Only use ephemeral when referenced by one downstream model
  2. Keep it simple - Ephemeral models should be small, focused transformations
  3. Document purpose - Clearly describe why a model is ephemeral
  4. Test downstream - Test the downstream model that uses the ephemeral model
  5. Monitor performance - Watch for CTE-related query performance issues
  6. Use for security - Hide intermediate business logic from end users
  7. Avoid for large data - Large ephemeral CTEs can cause optimizer issues
  8. Consider alternatives - Views for shared transformations, tables for large data

See Also

Advertisement

Need Expert dbt Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement