dbt Best Practices
Best Practices Architecture
Architecture Diagram
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DBT BEST PRACTICES FRAMEWORK β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β PRACTICE DIMENSIONS β β
β β β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββββββ β β
β β β CODE β β PROJECT β β OPERATIONS β β β
β β β STYLE β β STRUCTURE β β β β β
β β β β β β β β β β
β β β β’ Naming β β β’ Layers β β β’ CI/CD β β β
β β β β’ Formatting β β β’ Modules β β β’ Monitoring β β β
β β β β’ Comments β β β’ Packages β β β’ Alerting β β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β QUALITY ASSURANCE β β
β β β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββββββ β β
β β β TESTING β β DOCUMENTA- β β CODE REVIEW β β β
β β β β β TION β β β β β
β β β β’ Unit β β β’ Descriptionsβ β β’ Peer review β β β
β β β β’ Integrationβ β β’ Lineage β β β’ Automated checks β β β
β β β β’ Data β β β’ Examples β β β’ Style enforcement β β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Naming Conventions
Architecture Diagram
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β NAMING CONVENTIONS β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β MODEL NAMING β β
β β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β Pattern: {layer}_{entity} β β β
β β β β β β
β β β Staging: β β β
β β β βββ stg_orders (1:1 with source) β β β
β β β βββ stg_customers (1:1 with source) β β β
β β β βββ stg_products (1:1 with source) β β β
β β β β β β
β β β Intermediate: β β β
β β β βββ int_orders_joined (joins staging models) β β β
β β β βββ int_orders_aggregated (aggregations) β β β
β β β βββ int_orders_cleaned (data cleaning) β β β
β β β β β β
β β β Marts: β β β
β β β βββ fct_orders (fact table) β β β
β β β βββ dim_customers (dimension table) β β β
β β β βββ agg_orders_daily (aggregated) β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β OTHER NAMING β β
β β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β Sources: β β β
β β β βββ {source}_{table} (e.g., shopify_orders) β β β
β β β β β β
β β β Tests: β β β
β β β βββ test_{model}_{column} (e.g., test_orders_amount) β β β
β β β β β β
β β β Macros: β β β
β β β βββ {verb}_{entity} (e.g., create_table, join_tables) β β β
β β β β β β
β β β Variables: β β β
β β β βββ {entity}_{setting} (e.g., start_date, enable_audit) β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Project Structure
Architecture Diagram
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PROJECT STRUCTURE BEST PRACTICES β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β RECOMMENDED STRUCTURE β β
β β β β
β β my_project/ β β
β β βββ dbt_project.yml β β
β β βββ packages.yml β β
β β βββ profiles.yml β β
β β β β β
β β βββ models/ β β
β β β βββ staging/ β β
β β β β βββ _sources.yml βββ Source definitions β β
β β β β βββ stg_customers.sql β β
β β β β βββ stg_orders.sql β β
β β β β βββ stg_products.sql β β
β β β β β β
β β β βββ intermediate/ β β
β β β β βββ int_orders_joined.sql β β
β β β β βββ int_orders_aggregated.sql β β
β β β β βββ int_orders_cleaned.sql β β
β β β β β β
β β β βββ marts/ β β
β β β βββ finance/ β β
β β β β βββ fct_orders.sql β β
β β β β βββ dim_customers.sql β β
β β β β βββ fct_revenue.sql β β
β β β β β β
β β β βββ marketing/ β β
β β β β βββ fct_campaigns.sql β β
β β β β βββ dim_campaigns.sql β β
β β β β β β
β β β βββ product/ β β
β β β βββ fct_events.sql β β
β β β βββ dim_users.sql β β
β β β β β
β β βββ seeds/ βββ CSV seed data β β
β β βββ snapshots/ βββ SCD snapshots β β
β β βββ tests/ βββ Custom data tests β β
β β βββ macros/ βββ Reusable macros β β
β β βββ analysis/ βββ Ad-hoc analysis β β
β β βββ docs/ βββ Documentation β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Testing Strategy
Architecture Diagram
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TESTING STRATEGY β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β TEST PYRAMID β β
β β β β
β β β² β β
β β β± β² β β
β β β± β² β β
β β β± E2E β² End-to-end tests β β
β β β±ββββββββ² (few, slow, expensive) β β
β β β± β² β β
β β β±Integrationβ² Integration tests β β
β β β±ββββββββββββββ² (moderate, medium speed) β β
β β β± β² β β
β β β± Unit β² Unit tests β β
β β β±ββββββββββββββββββββ² (many, fast, cheap) β β
β β β± β² β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β TEST COVERAGE TARGETS β β
β β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β Model Tests: β β β
β β β βββ Every model has tests β β β
β β β βββ Every primary key is tested for uniqueness β β β
β β β βββ Every foreign key is tested for relationships β β β
β β β βββ Critical columns are tested for not_null β β β
β β β βββ Business rules are tested with custom tests β β β
β β β β β β
β β β Source Tests: β β β
β β β βββ Source freshness is monitored β β β
β β β βββ Critical source columns are tested β β β
β β β β β β
β β β Coverage Goals: β β β
β β β βββ Model coverage: 100% β β β
β β β βββ Column coverage: 80% β β β
β β β βββ Business rule coverage: 90% β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Detailed Explanation
Best practices in dbt ensure maintainable, scalable, and reliable data transformations.
Code Style
- Consistent formatting - Use consistent indentation and spacing
- Descriptive naming - Clear, meaningful names for all objects
- Modular code - Small, focused models and macros
- Documentation - Comprehensive descriptions and examples
Project Structure
- Layered architecture - staging β intermediate β marts
- Domain organization - Group by business domain
- Consistent patterns - Apply patterns consistently
- Separation of concerns - Clear boundaries between layers
Testing Strategy
- Test everything - Models, sources, macros
- Automate tests - Run tests in CI/CD
- Monitor results - Track test pass/fail rates
- Alert on failures - Notify on test failures
Performance Optimization
- Use incremental models - For large fact tables
- Partition and cluster - Optimize for query patterns
- Monitor performance - Track query times
- Optimize continuously - Improve based on metrics
Documentation
- Document all models - Clear descriptions
- Track lineage - End-to-end data flow
- Provide examples - Sample queries and outputs
- Keep documentation current - Update with changes
Code Examples
Model Documentation Template
# models/marts/fct_orders.yml
version: 2
models:
- name: fct_orders
description: >
Fact table containing all order transactions. This is the central
fact table for the order analytics domain.
**Grain**: One row per order
**Key Business Questions**:
- What is our total revenue?
- How many orders do we process?
- What is the average order value?
**Data Source**: Shopify via Fivetran
**Refresh Frequency**: Hourly
**Owner**: Data Engineering Team
config:
tags: ['finance', 'core', 'production']
meta:
owner: data-engineering
team: analytics
cost_center: finance
columns:
- name: order_id
description: "Unique identifier for each order"
data_tests:
- unique
- not_null
meta:
system: shopify
pii: false
- name: customer_id
description: "Foreign key to dim_customers"
data_tests:
- not_null
- relationships:
to: ref('dim_customers')
field: customer_id
meta:
pii: false
- name: order_date
description: "Date when the order was placed"
data_tests:
- not_null
meta:
format: YYYY-MM-DD
Standardized Staging Model
-- models/staging/stg_orders.sql
{{
config(
materialized='view',
schema='staging'
)
}}
with source as (
select * from {{ source('shopify', 'orders') }}
),
renamed as (
select
-- Primary keys
id as order_id,
customer_id,
-- Dimensions
status as order_status,
financial_status,
fulfillment_status,
-- Measures
{{ dbt_utils.money_snapshot(amount) }} as order_amount,
{{ dbt_utils.money_snapshot(total_discounts) }} as discount_amount,
{{ dbt_utils.money_snapshot(total_tax) }} as tax_amount,
-- Timestamps
created_at as order_created_at,
updated_at as order_updated_at,
-- Metadata
_fivetran_synced as synced_at
from source
)
select * from renamed
Standardized Fact Model
-- models/marts/fct_orders.sql
{{
config(
materialized='incremental',
unique_key='order_id',
incremental_strategy='merge',
partition_by={
"field": "order_date",
"data_type": "date"
},
cluster_by=['customer_id', 'order_status'],
tags=['finance', 'core', 'production']
)
}}
with orders as (
select * from {{ ref('stg_orders') }}
),
customers as (
select * from {{ ref('dim_customers') }}
),
order_items as (
select * from {{ ref('stg_order_items') }}
),
final as (
select
-- Primary key
orders.order_id,
-- Foreign keys
orders.customer_id,
-- Dimensions
customers.customer_name,
customers.segment as customer_segment,
orders.order_status,
orders.order_date,
-- Measures
orders.order_amount,
orders.discount_amount,
orders.tax_amount,
count(order_items.item_id) as item_count,
-- Timestamps
orders.order_created_at,
orders.order_updated_at,
current_timestamp() as dbt_updated_at
from orders
left join customers on orders.customer_id = customers.customer_id
left join order_items on orders.order_id = order_items.order_id
group by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
)
select * from final
{% if is_incremental() %}
where dbt_updated_at > (select max(dbt_updated_at) from {{ this }})
{% endif %}
Testing Template
# models/marts/fct_orders_tests.yml
version: 2
models:
- name: fct_orders
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- order_id
- dbt_utils.accepted_range:
min_value: 0
max_value: 1000000
column_name: order_amount
columns:
- name: order_id
data_tests:
- unique
- not_null
- name: customer_id
data_tests:
- not_null
- relationships:
to: ref('dim_customers')
field: customer_id
- name: order_amount
data_tests:
- not_null
- dbt_utils.accepted_range:
min_value: 0
max_value: 1000000
- name: order_date
data_tests:
- not_null
- dbt_utils.expression_is_true:
expression: "order_date <= current_date()"
Macro Template
-- macros/generate_generic_test.sql
{% macro generate_generic_test(test_name, model, column_name, config) %}
{% set test_sql %}
select
'{{ test_name }}' as test_name,
'{{ model }}' as model_name,
'{{ column_name }}' as column_name,
count(*) as failures
from {{ model }}
where {{ column_name }} is null
{% endset %}
{% set result = run_query(test_sql) %}
{{ return(result) }}
{% endmacro %}
Performance Metrics
| Metric | Description | Target |
|---|---|---|
| Code Quality | Linting and style checks | 100% pass |
| Test Coverage | Percentage of models tested | >90% |
| Documentation Coverage | Percentage documented | >95% |
| Build Success Rate | Percentage of successful builds | >99% |
| Average Build Time | Time to build all models | <30min |
Best Practices
- Follow naming conventions - Consistent, descriptive names
- Use layered architecture - staging β intermediate β marts
- Test everything - Models, sources, macros
- Document comprehensively - Descriptions, lineage, examples
- Use version control - All code in Git
- Implement CI/CD - Automated testing and deployment
- Monitor performance - Track metrics continuously
- Review code regularly - Peer reviews and audits