CW

dbt Custom Macros

Free Lesson

Advertisement

dbt Custom Macros

Custom Macro Architecture

Macro Type Pipeline

Formal Definitions

DfMacro

A macro in dbt is a reusable block of Jinja and SQL code defined in .sql files under the macros/ directory. Macros accept arguments, process them through Jinja templating, and return SQL strings. They are invoked in models using {{ macro_name(arg1, arg2) }} syntax. Macros enable DRY (Don't Repeat Yourself) principles by encapsulating common transformation patterns.

DfMacro Namespace

The macro namespace is the resolution order for macro names. dbt searches for macros in this order: project macros > package macros > dispatched macros. This allows packages to provide default implementations that projects can override. The namespace ensures that custom macros take precedence over package-provided macros.

DfDispatch Macro

A dispatch macro enables adapter-specific implementations of a generic macro. The dispatch() function looks for a macro with a specific suffix (e.g., _bigquery, _snowflake) before falling back to the default implementation. This allows the same macro to behave differently across database platforms while maintaining a consistent interface.

DfOperation Macro

An operation macro is a macro designed to be called with dbt run-operation macro_name. Unlike SQL macros that return SQL, operation macros execute arbitrary logic: querying metadata, creating objects, sending notifications, or running DDL. They are invoked from the CLI, not from within models.

Detailed Explanation

dbt macros are the primary mechanism for code reuse and abstraction. They range from simple SQL snippets to complex Jinja logic that generates platform-specific queries.

Macro Categories

  1. SQL Macros - Return SQL fragments for use in models
  2. Operation Macros - Execute side effects from the CLI
  3. Generic Test Macros - Define custom data test logic
  4. Materialization Macros - Override default materialization behavior
  5. Dispatch Macros - Provide adapter-specific implementations

Macros can access dbt's context variables: this, ref(), source(), var(), target, env_var(), and all adapter methods. This makes macros extremely powerful for generating dynamic SQL based on project configuration and environment.

When building custom macros, always use return() to output SQL. Without an explicit return, Jinja's default behavior may produce unexpected results. For operation macros, use run_query() to execute DDL/DML and log() for debugging output.

Code Examples

Basic SQL Macro

-- macros/generate_schema_name.sql
{% macro generate_schema_name(custom_schema_name, node) -%}
    {%- set default_schema = target.schema -%}
    {%- if custom_schema_name is none -%}
        {{ default_schema }}
    {%- else -%}
        {{ default_schema }}_{{ custom_schema_name | trim }}
    {%- endif -%}
{%- endmacro %}

Macro with Arguments

-- macros/cast_column.sql
{% macro cast_column(column_name, target_type) %}
    {%- if target.type == 'bigquery' %}
        cast({{ column_name }} as {{ target_type }})
    {%- elif target.type == 'snowflake' %}
        cast({{ column_name }} as {{ target_type }})
    {%- elif target.type == 'redshift' %}
        cast({{ column_name }} as {{ target_type }})
    {%- else %}
        cast({{ column_name }} as {{ target_type }})
    {%- endif %}
{% endmacro %}

-- Usage in model:
-- select {{ cast_column('amount', 'numeric(10,2)') }} as amount

Complex Transformation Macro

-- macros/generate_dimensions.sql
{% macro generate_dimensions(columns, prefix='dim') %}
    {%- for col in columns %}
        {{ prefix }}_{{ col }} as {{ col }}
        {%- if not loop.last %},{% endif %}
    {%- endfor %}
{% endmacro %}

-- macros/generate_measures.sql
{% macro generate_measures(measures, agg='sum') %}
    {%- for measure in measures %}
        {{ agg }}({{ measure }}) as {{ agg }}_{{ measure }}
        {%- if not loop.last %},{% endif %}
    {%- endfor %}
{% endmacro %}

-- Usage in model:
-- select
--     {{ generate_dimensions(['product_id', 'product_name', 'category']) }},
--     {{ generate_measures(['quantity', 'revenue']) }}
-- from stg_orders

Operation Macro

-- macros/refresh_all_snapshots.sql
{% macro refresh_all_snapshots() %}
    {%- set snapshot_models = graph.nodes.values() | selectattr('resource_type', 'equalto', 'snapshot') -%}
    
    {%- for snapshot in snapshot_models %}
        {%- set snapshot_sql %}
            snapshot {{ snapshot.unique_id }}
        {%- endset -%}
        
        {{ log("Refreshing snapshot: " ~ snapshot.name, info=True) }}
        {% do run_query(snapshot_sql) %}
    {%- endfor %}
    
    {{ log("All snapshots refreshed successfully", info=True) }}
{% endmacro %}

-- Run with: dbt run-operation refresh_all_snapshots

Custom Generic Test Macro

-- macros/test_accepted_range.sql
{% test accepted_range(model, column_name, min_value, max_value) %}
    select
        '{{ model }}' as model_name,
        '{{ column_name }}' as column_name,
        count(*) as failures
    from {{ model }}
    where {{ column_name }} < {{ min_value }}
       or {{ column_name }} > {{ max_value }}
{% endtest %}

-- Usage in YAML:
-- data_tests:
--   - accepted_range:
--       min_value: 0
--       max_value: 1000000

Dispatch Macro

-- macros/cross_db_utils/cast_as_bigint.sql
{% macro cast_as_bigint(column) %}
    {{ return(adapter.dispatch('cast_as_bigint')(column)) }}
{% endmacro %}

{% macro default__cast_as_bigint(column) %}
    cast({{ column }} as bigint)
{% endmacro %}

{% macro bigquery__cast_as_bigint(column) %}
    safe_cast({{ column }} as int64)
{% endmacro %}

{% macro snowflake__cast_as_bigint(column) %}
    cast({{ column }} as bigint)
{% endmacro %}

{% macro redshift__cast_as_bigint(column) %}
    cast({{ column }} as bigint)
{% endmacro %}

Macro for Data Migration

-- macros/backfill_column.sql
{% macro backfill_column(target_table, source_table, column_name, key_column) %}
    {%- set sql -%}
        update {{ target_table }} as t
        set {{ column_name }} = s.{{ column_name }}
        from {{ source_table }} as s
        where t.{{ key_column }} = s.{{ key_column }}
          and t.{{ column_name }} is null
    {%- endset -%}
    
    {{ log("Executing backfill for column: " ~ column_name, info=True) }}
    {% do run_query(sql) %}
    {{ log("Backfill complete for column: " ~ column_name, info=True) }}
{% endmacro %}

-- Run with: dbt run-operation backfill_column --args '{"target_table": "analytics.fct_orders", "source_table": "staging.stg_orders", "column_name": "customer_segment", "key_column": "order_id"}'

Macro for Documentation

-- macros/generate_column_description.sql
{% macro generate_column_description(column_name, description, data_type, nullable=true, primary_key=false) %}
    {%- set parts = [] -%}
    {%- do parts.append(description) -%}
    {%- if primary_key -%}
        {%- do parts.append("Primary key") -%}
    {%- endif -%}
    {%- if not nullable -%}
        {%- do parts.append("NOT NULL") -%}
    {%- endif -%}
    {%- do parts.append("Type: " ~ data_type) -%}
    
    {{ parts | join(' | ') }}
{% endmacro %}

Macro Reference Table

Macro TypeReturn TypeInvocationUse Case
SQLSQL string{{ macro() }} in modelsReusable SQL fragments
OperationNone (side effects)dbt run-operationCLI automation
Generic TestSQL (test query)YAML data_tests:Custom test logic
MaterializationSQL/DMLconfig(materialized=...)Custom materializations
DispatchSQL stringadapter.dispatch()Cross-database compatibility

Best Practices

  1. Naming conventions - Use snake_case for macro names
  2. Documentation - Add docstrings at the top of each macro
  3. Argument validation - Check required arguments before processing
  4. Error handling - Use exceptions.warn() for non-fatal errors
  5. Testing - Test macros with dbt run-operation before deployment
  6. Versioning - Include macros in package versioning
  7. Cross-database - Use dispatch for database-specific logic
  8. Performance - Avoid unnecessary Jinja loops in large datasets

See Also

Advertisement

Need Expert dbt Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement