CW

Analytics Engineering with dbt

Module 15: Data Engineering & MLOpsFree Lesson

Advertisement

Analytics Engineering with dbt

dbt (data build tool) transforms data in warehouses using SQL, version control, and software engineering best practices. Analytics engineers bridge data engineering and business intelligence.

dbt Architecture

dbt Project ArchitectureSource DataRaw tables inwarehousedbtTransformsStagingstg_ modelsMartsfct_ / dim_dbt_project.ymlProject configmodels/SQL + YAMLtests/Schema + data testsmacros/Jinja functionsDAG (Directed Acyclic Graph)Models are executed in dependency order — dbt resolves the DAG automatically

1. Project Structure

Architecture Diagram
my_project/
├── dbt_project.yml
├── packages.yml
├── profiles.yml
├── models/
│   ├── staging/
│   │   ├── _staging__sources.yml
│   │   ├── stg_orders.sql
│   │   └── stg_customers.sql
│   ├── marts/
│   │   ├── finance/
│   │   │   ├── fct_revenue.sql
│   │   │   └── dim_customers.sql
│   │   └── marketing/
│   │       └── campaign_performance.sql
│   └── intermediate/
│       └── int_joined.sql
├── macros/
│   └── generate_schema_name.sql
├── tests/
│   └── assert_positive_revenue.sql
├── snapshots/
│   └── scd_customers.sql
└── seeds/
    └── country_codes.csv

2. Models and Materializations

Materialization Types

View- No storage- Query on read- Always currentBest for:staging models,testingTable- Physical storage- Full rebuild- Incremental possibleBest for:small marts,seedsIncremental- Merge/append- Only new rows- Requires configBest for:large fact tables,event dataEphemeral- CTE only- No DB object- Cost optimizationBest for:intermediate logic,reusable CTEs

Model Example

-- models/staging/stg_orders.sql
{{
  config(
    materialized='view',
    tags=['staging', 'daily']
  )
}}

with source as (
    {{ ref('raw_orders') }}
),

renamed as (
    select
        id as order_id,
        user_id,
        status,
        created_at,
        _fivetran_deleted as is_deleted
    from source
    where not is_deleted
)

select * from renamed

3. Jinja Templating and Macros

-- macros/generate_schema_name.sql
{% macro generate_schema_name(custom_schema_name, node) -%}
    {%- set default_schema = target.schema -%}
    {%- if custom_schema_name is none -%}
        {{ default_schema }}
    {%- else -%}
        {{ default_schema }}_{{ custom_schema_name }}
    {%- endif -%}
{%- endmacro %}

-- macros/union_tables.sql
{% macro union_tables(table_list) %}
    {% for table in table_list %}
        SELECT * FROM {{ ref(table) }}
        {% if not loop.last %} UNION ALL {% endif %}
    {% endfor %}
{% endmacro %}

4. Testing and Data Quality

Schema Tests

# models/staging/_staging__sources.yml
version: 2

models:
  - name: stg_orders
    columns:
      - name: order_id
        tests:
          - unique
          - not_null
      - name: user_id
        tests:
          - not_null
          - relationships:
              to: ref('stg_customers')
              field: customer_id

Custom Tests

-- tests/assert_positive_revenue.sql
select
    order_id,
    amount
from {{ ref('fct_revenue') }}
where amount < 0

5. Snapshots (SCD Type 2)

-- snapshots/scd_customers.sql
{% snapshot scd_customers %}

{{
    config(
        target_schema='snapshots',
        unique_key='customer_id',
        strategy='timestamp',
        updated_at='updated_at'
    )
}}

select * from {{ ref('stg_customers') }}

{% endsnapshot %}

6. dbt Commands

CommandPurpose
dbt runExecute all models
dbt run --select model_nameRun specific model + parents
dbt testRun all tests
dbt buildRun + test combined
dbt snapshotExecute snapshots
dbt seedLoad CSV files
dbt docs generateBuild documentation site
dbt source freshnessCheck source data freshness

Key Takeaways

  • Staging → Intermediate → Marts: Clean layered architecture
  • Materializations control build strategy; use incremental for large tables
  • Tests are assertions — run them continuously
  • DAG ensures models build in correct dependency order
  • Version control everything — dbt projects are code

Advertisement

Need Expert Data Science Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement