CW

dbt Unit Testing

Free Lesson

Advertisement

dbt Unit Testing

Unit Testing Architecture

Unit Testing Pipeline

Formal Definitions

DfUnit Test in dbt

A unit test validates model logic in isolation by replacing upstream dependencies with mock data fixtures. It asserts that given specific input rows, the model produces exactly the expected output rows. Formally, for a model M with transformation function f, unit test verifies: for all input fixtures I, f(I) = E where E is the expected output. This isolates business logic from data volume and infrastructure concerns.

DfTest Fixture

A test fixture is a predefined dataset used as input or expected output for a unit test. Fixtures are defined as YAML or CSV files that specify exact column values and row counts. They replace the results of ref() and source() calls during test execution, enabling deterministic testing of transformation logic.

DfMock Reference

A mock reference is a substitution for a ref('model_name') or source('name', 'table') call during unit testing. Mocks replace upstream dependencies with fixture data, allowing the model under test to execute against controlled inputs rather than actual database tables. This isolation prevents upstream data changes from causing test failures.

Detailed Explanation

dbt unit testing validates that individual models transform data correctly by isolating them from upstream dependencies. Unlike integration tests that run against real data, unit tests use predefined fixtures to ensure deterministic, repeatable results. The dbt-utils package provides the dbt_utils-unit-tests macro that enables this capability.

Why Unit Testing Matters

  1. Isolation: Tests business logic independently of data quality
  2. Speed: Runs in seconds without querying large tables
  3. Determinism: Same fixtures always produce same results
  4. Debugging: Pinpoint exactly which transformation logic fails
  5. Regression: Catch unintended changes to model behavior

Test Coverage Strategy

Code Examples

Basic Unit Test Configuration

# models/marts/fct_orders.yml
version: 2

models:
  - name: fct_orders
    description: "Fact table for orders"
    data_tests:
      - unique:
          column_name: "order_id"
      - not_null:
          column_name: "order_id"

unit_tests:
  - name: test_fct_orders_basic
    model: fct_orders
    description: "Test basic order transformation logic"
    
    # Mock input for stg_orders
    inputs:
      - name: stg_orders
        rows:
          - { order_id: 1, customer_id: 100, order_date: "2024-01-15", status: "completed", amount: 50.00 }
          - { order_id: 2, customer_id: 101, order_date: "2024-01-16", status: "pending", amount: 75.50 }
      
      - name: stg_customers
        rows:
          - { customer_id: 100, customer_name: "Alice Johnson", segment: "premium" }
          - { customer_id: 101, customer_name: "Bob Smith", segment: "standard" }
    
    # Expected output
    expected_output:
      rows:
        - { order_id: 1, customer_id: 100, customer_name: "Alice Johnson", segment: "premium", amount: 50.00, order_date: "2024-01-15" }
        - { order_id: 2, customer_id: 101, customer_name: "Bob Smith", segment: "standard", amount: 75.50, order_date: "2024-01-16" }

Unit Test for Calculated Fields

# models/marts/fct_revenue.yml
unit_tests:
  - name: test_revenue_calculation
    model: fct_revenue
    description: "Test revenue calculation with discounts and tax"
    
    inputs:
      - name: stg_orders
        rows:
          - { order_id: 1, subtotal: 100.00, discount_pct: 10, tax_rate: 8 }
          - { order_id: 2, subtotal: 200.00, discount_pct: 0, tax_rate: 8 }
          - { order_id: 3, subtotal: 50.00, discount_pct: 25, tax_rate: 0 }
    
    expected_output:
      rows:
        - { order_id: 1, subtotal: 100.00, discount_amount: 10.00, tax_amount: 7.20, total: 97.20 }
        - { order_id: 2, subtotal: 200.00, discount_amount: 0.00, tax_amount: 16.00, total: 216.00 }
        - { order_id: 3, subtotal: 50.00, discount_amount: 12.50, tax_amount: 0.00, total: 37.50 }

Unit Test for Aggregation Logic

# models/marts/fct_daily_summary.yml
unit_tests:
  - name: test_daily_aggregation
    model: fct_daily_summary
    description: "Test daily aggregation with multiple orders per day"
    
    inputs:
      - name: stg_orders
        rows:
          - { order_id: 1, order_date: "2024-01-15", amount: 50.00 }
          - { order_id: 2, order_date: "2024-01-15", amount: 30.00 }
          - { order_id: 3, order_date: "2024-01-16", amount: 75.00 }
    
    expected_output:
      rows:
        - { order_date: "2024-01-15", total_amount: 80.00, order_count: 2 }
        - { order_date: "2024-01-16", total_amount: 75.00, order_count: 1 }

Unit Test for Null Handling

# models/marts/dim_customers.yml
unit_tests:
  - name: test_null_handling
    model: dim_customers
    description: "Test null value handling in customer dimension"
    
    inputs:
      - name: stg_customers
        rows:
          - { customer_id: 1, first_name: "Alice", last_name: "Johnson", email: "alice@example.com" }
          - { customer_id: 2, first_name: null, last_name: "Smith", email: null }
          - { customer_id: 3, first_name: "Charlie", last_name: null, email: "charlie@example.com" }
    
    expected_output:
      rows:
        - { customer_id: 1, full_name: "Alice Johnson", email_domain: "example.com" }
        - { customer_id: 2, full_name: "Smith", email_domain: null }
        - { customer_id: 3, full_name: "Charlie", email_domain: "example.com" }

Model Under Test

-- models/marts/fct_orders.sql
{{
    config(
        materialized='incremental',
        unique_key='order_id',
        tags=['finance', 'unit-tested']
    )
}}

with orders as (
    select * from {{ ref('stg_orders') }}
),

customers as (
    select * from {{ ref('stg_customers') }}
),

final as (
    select
        o.order_id,
        o.customer_id,
        c.customer_name,
        c.segment,
        o.amount,
        o.order_date,
        case
            when o.amount >= 100 then 'high_value'
            when o.amount >= 50 then 'medium_value'
            else 'low_value'
        end as order_tier
    from orders o
    left join customers c on o.customer_id = c.customer_id
)

select * from final

Running Unit Tests

# Run all unit tests
dbt test --select unit_tests

# Run specific unit test
dbt test --select test_fct_orders_basic

# Run with verbose output
dbt test --select unit_tests --vars '{"debug": true}'

# Run unit tests for a specific model
dbt test --select fct_orders+ --resource-type unit_test

Comparison: Unit Tests vs Data Tests

AspectUnit TestsData Tests
ScopeSingle model logicData quality rules
InputMock fixturesReal database tables
SpeedMillisecondsSeconds to minutes
DeterminismAlways deterministicMay vary with data
CoverageBusiness logicSchema constraints
MaintenanceFixtures need updatingTests need updating
Use CaseTransformation validationData integrity checks

Unit tests and data tests are complementary, not competing. Use unit tests to validate transformation logic in isolation, and data tests to validate data quality constraints against real data. A comprehensive testing strategy includes both.

Define unit test fixtures as YAML inline within model YAML files for maintainability. Keep fixtures small (5-20 rows) to ensure tests run quickly. Focus fixtures on edge cases: nulls, boundary values, and complex join scenarios.

The dbt-utils package provides the unit_tests configuration block. Ensure you're running dbt-core 1.5+ and have dbt-utils installed. Unit tests execute in a special context where ref() and source() calls are replaced with fixture data.

Test Fixture Best Practices

PracticeDescriptionExample
Minimal RowsUse 3-10 rows per fixtureOne row per edge case
Cover Edge CasesNulls, zeros, boundary valuesInclude null customer_id
Descriptive NamesClear test case descriptions"test_high_value_order"
Isolated TestsEach test validates one behaviorSeparate join vs calculation tests
Version ControlTrack fixture changes in GitReview fixture updates in PRs
Consistent FormatYAML for fixtures, not CSVInline YAML is more readable

Common Test Patterns

Pattern: Testing Joins

unit_tests:
  - name: test_order_customer_join
    model: fct_orders
    inputs:
      - name: stg_orders
        rows:
          - { order_id: 1, customer_id: 100 }
          - { order_id: 2, customer_id: 999 }  # Non-existent customer
      - name: stg_customers
        rows:
          - { customer_id: 100, name: "Alice" }
    expected_output:
      rows:
        - { order_id: 1, customer_id: 100, customer_name: "Alice" }
        - { order_id: 2, customer_id: 999, customer_name: null }  # Left join preserves row

Pattern: Testing Window Functions

unit_tests:
  - name: test_ranking_logic
    model: fct_ranked_orders
    inputs:
      - name: stg_orders
        rows:
          - { customer_id: 100, order_date: "2024-01-15", amount: 100 }
          - { customer_id: 100, order_date: "2024-01-16", amount: 200 }
          - { customer_id: 100, order_date: "2024-01-17", amount: 150 }
    expected_output:
      rows:
        - { customer_id: 100, order_date: "2024-01-15", amount: 100, rank: 3 }
        - { customer_id: 100, order_date: "2024-01-16", amount: 200, rank: 1 }
        - { customer_id: 100, order_date: "2024-01-17", amount: 150, rank: 2 }

Pattern: Testing Conditional Logic

unit_tests:
  - name: test_status_mapping
    model: fct_orders_enriched
    inputs:
      - name: stg_orders
        rows:
          - { order_id: 1, status: "completed" }
          - { order_id: 2, status: "cancelled" }
          - { order_id: 3, status: "returned" }
          - { order_id: 4, status: "unknown" }
    expected_output:
      rows:
        - { order_id: 1, status: "completed", category: "success" }
        - { order_id: 2, status: "cancelled", category: "failure" }
        - { order_id: 3, status: "returned", category: "failure" }
        - { order_id: 4, status: "unknown", category: "unknown" }

Performance Metrics

MetricDescriptionTarget
Test Execution TimeTime per unit test< 1 second
Fixture SizeNumber of rows per fixture3-20 rows
Test CoverageModels with unit tests> 80% of mart models
Pass RatePercentage of passing tests100%
Maintenance EffortTime to update fixtures< 5 min per test

Best Practices

  1. Start with critical models - Unit test mart models first
  2. Test edge cases - Nulls, boundary values, empty results
  3. Keep fixtures small - 3-10 rows is sufficient
  4. Descriptive test names - Clearly state what is being tested
  5. Separate concerns - One test per logical behavior
  6. Version control fixtures - Track changes alongside model code
  7. Run in CI/CD - Execute unit tests on every PR
  8. Complement with data tests - Use both unit and data tests

See Also

Advertisement

Need Expert dbt Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement