Analytics Engineering with dbt
dbt (data build tool) transforms data in warehouses using SQL, version control, and software engineering best practices. Analytics engineers bridge data engineering and business intelligence.
dbt Architecture
1. Project Structure
Architecture Diagram
my_project/
├── dbt_project.yml
├── packages.yml
├── profiles.yml
├── models/
│ ├── staging/
│ │ ├── _staging__sources.yml
│ │ ├── stg_orders.sql
│ │ └── stg_customers.sql
│ ├── marts/
│ │ ├── finance/
│ │ │ ├── fct_revenue.sql
│ │ │ └── dim_customers.sql
│ │ └── marketing/
│ │ └── campaign_performance.sql
│ └── intermediate/
│ └── int_joined.sql
├── macros/
│ └── generate_schema_name.sql
├── tests/
│ └── assert_positive_revenue.sql
├── snapshots/
│ └── scd_customers.sql
└── seeds/
└── country_codes.csv
2. Models and Materializations
Materialization Types
Model Example
-- models/staging/stg_orders.sql
{{
config(
materialized='view',
tags=['staging', 'daily']
)
}}
with source as (
{{ ref('raw_orders') }}
),
renamed as (
select
id as order_id,
user_id,
status,
created_at,
_fivetran_deleted as is_deleted
from source
where not is_deleted
)
select * from renamed
3. Jinja Templating and Macros
-- macros/generate_schema_name.sql
{% macro generate_schema_name(custom_schema_name, node) -%}
{%- set default_schema = target.schema -%}
{%- if custom_schema_name is none -%}
{{ default_schema }}
{%- else -%}
{{ default_schema }}_{{ custom_schema_name }}
{%- endif -%}
{%- endmacro %}
-- macros/union_tables.sql
{% macro union_tables(table_list) %}
{% for table in table_list %}
SELECT * FROM {{ ref(table) }}
{% if not loop.last %} UNION ALL {% endif %}
{% endfor %}
{% endmacro %}
4. Testing and Data Quality
Schema Tests
# models/staging/_staging__sources.yml
version: 2
models:
- name: stg_orders
columns:
- name: order_id
tests:
- unique
- not_null
- name: user_id
tests:
- not_null
- relationships:
to: ref('stg_customers')
field: customer_id
Custom Tests
-- tests/assert_positive_revenue.sql
select
order_id,
amount
from {{ ref('fct_revenue') }}
where amount < 0
5. Snapshots (SCD Type 2)
-- snapshots/scd_customers.sql
{% snapshot scd_customers %}
{{
config(
target_schema='snapshots',
unique_key='customer_id',
strategy='timestamp',
updated_at='updated_at'
)
}}
select * from {{ ref('stg_customers') }}
{% endsnapshot %}
6. dbt Commands
| Command | Purpose |
|---|---|
dbt run | Execute all models |
dbt run --select model_name | Run specific model + parents |
dbt test | Run all tests |
dbt build | Run + test combined |
dbt snapshot | Execute snapshots |
dbt seed | Load CSV files |
dbt docs generate | Build documentation site |
dbt source freshness | Check source data freshness |
Key Takeaways
- Staging → Intermediate → Marts: Clean layered architecture
- Materializations control build strategy; use incremental for large tables
- Tests are assertions — run them continuously
- DAG ensures models build in correct dependency order
- Version control everything — dbt projects are code