dbt Custom Macros
Custom Macro Architecture
Macro Type Pipeline
Formal Definitions
DfMacro
A macro in dbt is a reusable block of Jinja and SQL code defined in .sql files under the macros/ directory. Macros accept arguments, process them through Jinja templating, and return SQL strings. They are invoked in models using {{ macro_name(arg1, arg2) }} syntax. Macros enable DRY (Don't Repeat Yourself) principles by encapsulating common transformation patterns.
DfMacro Namespace
The macro namespace is the resolution order for macro names. dbt searches for macros in this order: project macros > package macros > dispatched macros. This allows packages to provide default implementations that projects can override. The namespace ensures that custom macros take precedence over package-provided macros.
DfDispatch Macro
A dispatch macro enables adapter-specific implementations of a generic macro. The dispatch() function looks for a macro with a specific suffix (e.g., _bigquery, _snowflake) before falling back to the default implementation. This allows the same macro to behave differently across database platforms while maintaining a consistent interface.
DfOperation Macro
An operation macro is a macro designed to be called with dbt run-operation macro_name. Unlike SQL macros that return SQL, operation macros execute arbitrary logic: querying metadata, creating objects, sending notifications, or running DDL. They are invoked from the CLI, not from within models.
Detailed Explanation
dbt macros are the primary mechanism for code reuse and abstraction. They range from simple SQL snippets to complex Jinja logic that generates platform-specific queries.
Macro Categories
- SQL Macros - Return SQL fragments for use in models
- Operation Macros - Execute side effects from the CLI
- Generic Test Macros - Define custom data test logic
- Materialization Macros - Override default materialization behavior
- Dispatch Macros - Provide adapter-specific implementations
Macros can access dbt's context variables: this, ref(), source(), var(), target, env_var(), and all adapter methods. This makes macros extremely powerful for generating dynamic SQL based on project configuration and environment.
When building custom macros, always use return() to output SQL. Without an explicit return, Jinja's default behavior may produce unexpected results. For operation macros, use run_query() to execute DDL/DML and log() for debugging output.
Code Examples
Basic SQL Macro
-- macros/generate_schema_name.sql
{% macro generate_schema_name(custom_schema_name, node) -%}
{%- set default_schema = target.schema -%}
{%- if custom_schema_name is none -%}
{{ default_schema }}
{%- else -%}
{{ default_schema }}_{{ custom_schema_name | trim }}
{%- endif -%}
{%- endmacro %}
Macro with Arguments
-- macros/cast_column.sql
{% macro cast_column(column_name, target_type) %}
{%- if target.type == 'bigquery' %}
cast({{ column_name }} as {{ target_type }})
{%- elif target.type == 'snowflake' %}
cast({{ column_name }} as {{ target_type }})
{%- elif target.type == 'redshift' %}
cast({{ column_name }} as {{ target_type }})
{%- else %}
cast({{ column_name }} as {{ target_type }})
{%- endif %}
{% endmacro %}
-- Usage in model:
-- select {{ cast_column('amount', 'numeric(10,2)') }} as amount
Complex Transformation Macro
-- macros/generate_dimensions.sql
{% macro generate_dimensions(columns, prefix='dim') %}
{%- for col in columns %}
{{ prefix }}_{{ col }} as {{ col }}
{%- if not loop.last %},{% endif %}
{%- endfor %}
{% endmacro %}
-- macros/generate_measures.sql
{% macro generate_measures(measures, agg='sum') %}
{%- for measure in measures %}
{{ agg }}({{ measure }}) as {{ agg }}_{{ measure }}
{%- if not loop.last %},{% endif %}
{%- endfor %}
{% endmacro %}
-- Usage in model:
-- select
-- {{ generate_dimensions(['product_id', 'product_name', 'category']) }},
-- {{ generate_measures(['quantity', 'revenue']) }}
-- from stg_orders
Operation Macro
-- macros/refresh_all_snapshots.sql
{% macro refresh_all_snapshots() %}
{%- set snapshot_models = graph.nodes.values() | selectattr('resource_type', 'equalto', 'snapshot') -%}
{%- for snapshot in snapshot_models %}
{%- set snapshot_sql %}
snapshot {{ snapshot.unique_id }}
{%- endset -%}
{{ log("Refreshing snapshot: " ~ snapshot.name, info=True) }}
{% do run_query(snapshot_sql) %}
{%- endfor %}
{{ log("All snapshots refreshed successfully", info=True) }}
{% endmacro %}
-- Run with: dbt run-operation refresh_all_snapshots
Custom Generic Test Macro
-- macros/test_accepted_range.sql
{% test accepted_range(model, column_name, min_value, max_value) %}
select
'{{ model }}' as model_name,
'{{ column_name }}' as column_name,
count(*) as failures
from {{ model }}
where {{ column_name }} < {{ min_value }}
or {{ column_name }} > {{ max_value }}
{% endtest %}
-- Usage in YAML:
-- data_tests:
-- - accepted_range:
-- min_value: 0
-- max_value: 1000000
Dispatch Macro
-- macros/cross_db_utils/cast_as_bigint.sql
{% macro cast_as_bigint(column) %}
{{ return(adapter.dispatch('cast_as_bigint')(column)) }}
{% endmacro %}
{% macro default__cast_as_bigint(column) %}
cast({{ column }} as bigint)
{% endmacro %}
{% macro bigquery__cast_as_bigint(column) %}
safe_cast({{ column }} as int64)
{% endmacro %}
{% macro snowflake__cast_as_bigint(column) %}
cast({{ column }} as bigint)
{% endmacro %}
{% macro redshift__cast_as_bigint(column) %}
cast({{ column }} as bigint)
{% endmacro %}
Macro for Data Migration
-- macros/backfill_column.sql
{% macro backfill_column(target_table, source_table, column_name, key_column) %}
{%- set sql -%}
update {{ target_table }} as t
set {{ column_name }} = s.{{ column_name }}
from {{ source_table }} as s
where t.{{ key_column }} = s.{{ key_column }}
and t.{{ column_name }} is null
{%- endset -%}
{{ log("Executing backfill for column: " ~ column_name, info=True) }}
{% do run_query(sql) %}
{{ log("Backfill complete for column: " ~ column_name, info=True) }}
{% endmacro %}
-- Run with: dbt run-operation backfill_column --args '{"target_table": "analytics.fct_orders", "source_table": "staging.stg_orders", "column_name": "customer_segment", "key_column": "order_id"}'
Macro for Documentation
-- macros/generate_column_description.sql
{% macro generate_column_description(column_name, description, data_type, nullable=true, primary_key=false) %}
{%- set parts = [] -%}
{%- do parts.append(description) -%}
{%- if primary_key -%}
{%- do parts.append("Primary key") -%}
{%- endif -%}
{%- if not nullable -%}
{%- do parts.append("NOT NULL") -%}
{%- endif -%}
{%- do parts.append("Type: " ~ data_type) -%}
{{ parts | join(' | ') }}
{% endmacro %}
Macro Reference Table
| Macro Type | Return Type | Invocation | Use Case |
|---|---|---|---|
| SQL | SQL string | {{ macro() }} in models | Reusable SQL fragments |
| Operation | None (side effects) | dbt run-operation | CLI automation |
| Generic Test | SQL (test query) | YAML data_tests: | Custom test logic |
| Materialization | SQL/DML | config(materialized=...) | Custom materializations |
| Dispatch | SQL string | adapter.dispatch() | Cross-database compatibility |
Best Practices
- Naming conventions - Use snake_case for macro names
- Documentation - Add docstrings at the top of each macro
- Argument validation - Check required arguments before processing
- Error handling - Use
exceptions.warn()for non-fatal errors - Testing - Test macros with dbt run-operation before deployment
- Versioning - Include macros in package versioning
- Cross-database - Use dispatch for database-specific logic
- Performance - Avoid unnecessary Jinja loops in large datasets
See Also
- Advanced Jinja — Advanced Jinja patterns and techniques
- dbt Projects — Project configuration and structure
- dbt Best Practices — Code organization patterns
- dbt Package Development — Building and publishing packages