dbt Unit Testing
Unit Testing Architecture
Unit Testing Pipeline
Formal Definitions
DfUnit Test in dbt
A unit test validates model logic in isolation by replacing upstream dependencies with mock data fixtures. It asserts that given specific input rows, the model produces exactly the expected output rows. Formally, for a model M with transformation function f, unit test verifies: for all input fixtures I, f(I) = E where E is the expected output. This isolates business logic from data volume and infrastructure concerns.
DfTest Fixture
A test fixture is a predefined dataset used as input or expected output for a unit test. Fixtures are defined as YAML or CSV files that specify exact column values and row counts. They replace the results of ref() and source() calls during test execution, enabling deterministic testing of transformation logic.
DfMock Reference
A mock reference is a substitution for a ref('model_name') or source('name', 'table') call during unit testing. Mocks replace upstream dependencies with fixture data, allowing the model under test to execute against controlled inputs rather than actual database tables. This isolation prevents upstream data changes from causing test failures.
Detailed Explanation
dbt unit testing validates that individual models transform data correctly by isolating them from upstream dependencies. Unlike integration tests that run against real data, unit tests use predefined fixtures to ensure deterministic, repeatable results. The dbt-utils package provides the dbt_utils-unit-tests macro that enables this capability.
Why Unit Testing Matters
- Isolation: Tests business logic independently of data quality
- Speed: Runs in seconds without querying large tables
- Determinism: Same fixtures always produce same results
- Debugging: Pinpoint exactly which transformation logic fails
- Regression: Catch unintended changes to model behavior
Test Coverage Strategy
Code Examples
Basic Unit Test Configuration
# models/marts/fct_orders.yml
version: 2
models:
- name: fct_orders
description: "Fact table for orders"
data_tests:
- unique:
column_name: "order_id"
- not_null:
column_name: "order_id"
unit_tests:
- name: test_fct_orders_basic
model: fct_orders
description: "Test basic order transformation logic"
# Mock input for stg_orders
inputs:
- name: stg_orders
rows:
- { order_id: 1, customer_id: 100, order_date: "2024-01-15", status: "completed", amount: 50.00 }
- { order_id: 2, customer_id: 101, order_date: "2024-01-16", status: "pending", amount: 75.50 }
- name: stg_customers
rows:
- { customer_id: 100, customer_name: "Alice Johnson", segment: "premium" }
- { customer_id: 101, customer_name: "Bob Smith", segment: "standard" }
# Expected output
expected_output:
rows:
- { order_id: 1, customer_id: 100, customer_name: "Alice Johnson", segment: "premium", amount: 50.00, order_date: "2024-01-15" }
- { order_id: 2, customer_id: 101, customer_name: "Bob Smith", segment: "standard", amount: 75.50, order_date: "2024-01-16" }
Unit Test for Calculated Fields
# models/marts/fct_revenue.yml
unit_tests:
- name: test_revenue_calculation
model: fct_revenue
description: "Test revenue calculation with discounts and tax"
inputs:
- name: stg_orders
rows:
- { order_id: 1, subtotal: 100.00, discount_pct: 10, tax_rate: 8 }
- { order_id: 2, subtotal: 200.00, discount_pct: 0, tax_rate: 8 }
- { order_id: 3, subtotal: 50.00, discount_pct: 25, tax_rate: 0 }
expected_output:
rows:
- { order_id: 1, subtotal: 100.00, discount_amount: 10.00, tax_amount: 7.20, total: 97.20 }
- { order_id: 2, subtotal: 200.00, discount_amount: 0.00, tax_amount: 16.00, total: 216.00 }
- { order_id: 3, subtotal: 50.00, discount_amount: 12.50, tax_amount: 0.00, total: 37.50 }
Unit Test for Aggregation Logic
# models/marts/fct_daily_summary.yml
unit_tests:
- name: test_daily_aggregation
model: fct_daily_summary
description: "Test daily aggregation with multiple orders per day"
inputs:
- name: stg_orders
rows:
- { order_id: 1, order_date: "2024-01-15", amount: 50.00 }
- { order_id: 2, order_date: "2024-01-15", amount: 30.00 }
- { order_id: 3, order_date: "2024-01-16", amount: 75.00 }
expected_output:
rows:
- { order_date: "2024-01-15", total_amount: 80.00, order_count: 2 }
- { order_date: "2024-01-16", total_amount: 75.00, order_count: 1 }
Unit Test for Null Handling
# models/marts/dim_customers.yml
unit_tests:
- name: test_null_handling
model: dim_customers
description: "Test null value handling in customer dimension"
inputs:
- name: stg_customers
rows:
- { customer_id: 1, first_name: "Alice", last_name: "Johnson", email: "alice@example.com" }
- { customer_id: 2, first_name: null, last_name: "Smith", email: null }
- { customer_id: 3, first_name: "Charlie", last_name: null, email: "charlie@example.com" }
expected_output:
rows:
- { customer_id: 1, full_name: "Alice Johnson", email_domain: "example.com" }
- { customer_id: 2, full_name: "Smith", email_domain: null }
- { customer_id: 3, full_name: "Charlie", email_domain: "example.com" }
Model Under Test
-- models/marts/fct_orders.sql
{{
config(
materialized='incremental',
unique_key='order_id',
tags=['finance', 'unit-tested']
)
}}
with orders as (
select * from {{ ref('stg_orders') }}
),
customers as (
select * from {{ ref('stg_customers') }}
),
final as (
select
o.order_id,
o.customer_id,
c.customer_name,
c.segment,
o.amount,
o.order_date,
case
when o.amount >= 100 then 'high_value'
when o.amount >= 50 then 'medium_value'
else 'low_value'
end as order_tier
from orders o
left join customers c on o.customer_id = c.customer_id
)
select * from final
Running Unit Tests
# Run all unit tests
dbt test --select unit_tests
# Run specific unit test
dbt test --select test_fct_orders_basic
# Run with verbose output
dbt test --select unit_tests --vars '{"debug": true}'
# Run unit tests for a specific model
dbt test --select fct_orders+ --resource-type unit_test
Comparison: Unit Tests vs Data Tests
| Aspect | Unit Tests | Data Tests |
|---|---|---|
| Scope | Single model logic | Data quality rules |
| Input | Mock fixtures | Real database tables |
| Speed | Milliseconds | Seconds to minutes |
| Determinism | Always deterministic | May vary with data |
| Coverage | Business logic | Schema constraints |
| Maintenance | Fixtures need updating | Tests need updating |
| Use Case | Transformation validation | Data integrity checks |
Unit tests and data tests are complementary, not competing. Use unit tests to validate transformation logic in isolation, and data tests to validate data quality constraints against real data. A comprehensive testing strategy includes both.
Define unit test fixtures as YAML inline within model YAML files for maintainability. Keep fixtures small (5-20 rows) to ensure tests run quickly. Focus fixtures on edge cases: nulls, boundary values, and complex join scenarios.
The dbt-utils package provides the unit_tests configuration block. Ensure you're running dbt-core 1.5+ and have dbt-utils installed. Unit tests execute in a special context where ref() and source() calls are replaced with fixture data.
Test Fixture Best Practices
| Practice | Description | Example |
|---|---|---|
| Minimal Rows | Use 3-10 rows per fixture | One row per edge case |
| Cover Edge Cases | Nulls, zeros, boundary values | Include null customer_id |
| Descriptive Names | Clear test case descriptions | "test_high_value_order" |
| Isolated Tests | Each test validates one behavior | Separate join vs calculation tests |
| Version Control | Track fixture changes in Git | Review fixture updates in PRs |
| Consistent Format | YAML for fixtures, not CSV | Inline YAML is more readable |
Common Test Patterns
Pattern: Testing Joins
unit_tests:
- name: test_order_customer_join
model: fct_orders
inputs:
- name: stg_orders
rows:
- { order_id: 1, customer_id: 100 }
- { order_id: 2, customer_id: 999 } # Non-existent customer
- name: stg_customers
rows:
- { customer_id: 100, name: "Alice" }
expected_output:
rows:
- { order_id: 1, customer_id: 100, customer_name: "Alice" }
- { order_id: 2, customer_id: 999, customer_name: null } # Left join preserves row
Pattern: Testing Window Functions
unit_tests:
- name: test_ranking_logic
model: fct_ranked_orders
inputs:
- name: stg_orders
rows:
- { customer_id: 100, order_date: "2024-01-15", amount: 100 }
- { customer_id: 100, order_date: "2024-01-16", amount: 200 }
- { customer_id: 100, order_date: "2024-01-17", amount: 150 }
expected_output:
rows:
- { customer_id: 100, order_date: "2024-01-15", amount: 100, rank: 3 }
- { customer_id: 100, order_date: "2024-01-16", amount: 200, rank: 1 }
- { customer_id: 100, order_date: "2024-01-17", amount: 150, rank: 2 }
Pattern: Testing Conditional Logic
unit_tests:
- name: test_status_mapping
model: fct_orders_enriched
inputs:
- name: stg_orders
rows:
- { order_id: 1, status: "completed" }
- { order_id: 2, status: "cancelled" }
- { order_id: 3, status: "returned" }
- { order_id: 4, status: "unknown" }
expected_output:
rows:
- { order_id: 1, status: "completed", category: "success" }
- { order_id: 2, status: "cancelled", category: "failure" }
- { order_id: 3, status: "returned", category: "failure" }
- { order_id: 4, status: "unknown", category: "unknown" }
Performance Metrics
| Metric | Description | Target |
|---|---|---|
| Test Execution Time | Time per unit test | < 1 second |
| Fixture Size | Number of rows per fixture | 3-20 rows |
| Test Coverage | Models with unit tests | > 80% of mart models |
| Pass Rate | Percentage of passing tests | 100% |
| Maintenance Effort | Time to update fixtures | < 5 min per test |
Best Practices
- Start with critical models - Unit test mart models first
- Test edge cases - Nulls, boundary values, empty results
- Keep fixtures small - 3-10 rows is sufficient
- Descriptive test names - Clearly state what is being tested
- Separate concerns - One test per logical behavior
- Version control fixtures - Track changes alongside model code
- Run in CI/CD - Execute unit tests on every PR
- Complement with data tests - Use both unit and data tests
See Also
- Data Quality Tests — Schema and data testing conventions
- dbt Projects — Project configuration and structure
- dbt Best Practices — Testing strategy and patterns
- Advanced Jinja — Custom test macro patterns