dbt with Snowflake
Snowflake Architecture
Architecture Diagram
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā SNOWFLAKE + DBT ARCHITECTURE ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¤
ā ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā SNOWFLAKE ARCHITECTURE ā ā
ā ā ā ā
ā ā āāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāāāāāāāāāāāāāā ā ā
ā ā ā DATABASES ā ā SCHEMAS ā ā TABLES ā ā ā
ā ā ā ā ā ā ā ā ā ā
ā ā ā ⢠Logical ā ā ⢠Namespace ā ā ⢠Permanent ā ā ā
ā ā ā containers ā ā ⢠Access ā ā ⢠Transient ā ā ā
ā ā ā ⢠Shared ā ā control ā ā ⢠Temporary ā ā ā
ā ā āāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāāāāāāāāāāāāāā ā ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā ā
ā ā¼ ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā WAREHOUSE MANAGEMENT ā ā
ā ā ā ā
ā ā āāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāāāāāāāāāāāāāā ā ā
ā ā ā WAREHOUSES ā ā RESOURCE ā ā AUTO-SCALING ā ā ā
ā ā ā ā ā MONITORS ā ā ā ā ā
ā ā ā ⢠Compute ā ā ⢠Credit ā ā ⢠Min/Max clusters ā ā ā
ā ā ā ⢠Scaling ā ā usage ā ā ⢠Scaling policies ā ā ā
ā ā ā ⢠Size ā ā ⢠Query ā ā ⢠Queue management ā ā ā
ā ā āāāāāāāāāāāāāāāā ā history ā āāāāāāāāāāāāāāāāāāāāāāāāāāāā ā ā
ā ā āāāāāāāāāāāāāāāā ā ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
Time Travel Architecture
Architecture Diagram
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā SNOWFLAKE TIME TRAVEL ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¤
ā ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā TIME TRAVEL CAPABILITIES ā ā
ā ā ā ā
ā ā āāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāāāāāāāāāāāāāā ā ā
ā ā ā UNDROP ā ā QUERY ā ā CLONE ā ā ā
ā ā ā ā ā HISTORY ā ā ā ā ā
ā ā ā ⢠Restore ā ā ⢠Historical ā ā ⢠Zero-copy ā ā ā
ā ā ā tables ā ā queries ā ā ⢠Instant ā ā ā
ā ā ā ⢠Restore ā ā ⢠Point-in- ā ā ⢠Metadata ā ā ā
ā ā ā schemas ā ā time ā ā only ā ā ā
ā ā āāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāāāāāāāāāāāāāā ā ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā ā
ā ā¼ ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā TIME TRAVEL USAGE ā ā
ā ā ā ā
ā ā -- Query historical data ā ā
ā ā SELECT * FROM table AT (TIMESTAMP => '2024-01-01 12:00:00') ā ā
ā ā ā ā
ā ā -- Query before specific change ā ā
ā ā SELECT * FROM table BEFORE (STATEMENT => 'abc123') ā ā
ā ā ā ā
ā ā -- Restore dropped table ā ā
ā ā UNDROP TABLE table_name ā ā
ā ā ā ā
ā ā -- Clone table with zero copy ā ā
ā ā CREATE TABLE table_clone CLONE table_name ā ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
Warehouse Optimization
Architecture Diagram
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā WAREHOUSE OPTIMIZATION STRATEGY ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¤
ā ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā WAREHOUSE SIZING ā ā
ā ā ā ā
ā ā āāāāāāāāāāāāāāā¬āāāāāāāāāāāāā¬āāāāāāāāāāāāā¬āāāāāāāāāāāāāāāāāāāāāā ā ā
ā ā ā Size ā Credits/hr ā Best For ā dbt Usage ā ā ā
ā ā āāāāāāāāāāāāāāā¼āāāāāāāāāāāāā¼āāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāā⤠ā ā
ā ā ā X-Small ā 1 ā Dev/Test ā Development ā ā ā
ā ā ā Small ā 2 ā Light ETL ā Small models ā ā ā
ā ā ā Medium ā 4 ā Standard ā Most workloads ā ā ā
ā ā ā Large ā 8 ā Heavy ETL ā Large models ā ā ā
ā ā ā X-Large ā 16 ā Complex ā Complex transforms ā ā ā
ā ā ā 2X-Large ā 32 ā Enterprise ā Enterprise ETL ā ā ā
ā ā āāāāāāāāāāāāāāā“āāāāāāāāāāāāā“āāāāāāāāāāāāā“āāāāāāāāāāāāāāāāāāāāāā ā ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā ā
ā ā¼ ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā AUTO-SCALING ā ā
ā ā ā ā
ā ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā ā
ā ā ā MIN_CLUSTERS=1, MAX_CLUSTERS=4 ā ā ā
ā ā ā ā ā ā
ā ā ā Time āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¶ ā ā ā
ā ā ā Clusters: ā ā ā
ā ā ā āāāāāāā ā ā ā
ā ā ā ā 1 ā (idle) ā ā ā
ā ā ā āāāāāāā ā ā ā
ā ā ā āāāāāāā ā ā ā
ā ā ā ā 2 ā (queue builds) ā ā ā
ā ā ā āāāāāāā ā ā ā
ā ā ā āāāāāāā ā ā ā
ā ā ā ā 3 ā (queue persists) ā ā ā
ā ā ā āāāāāāā ā ā ā
ā ā ā āāāāāāā ā ā ā
ā ā ā ā 4 ā (max clusters) ā ā ā
ā ā ā āāāāāāā ā ā ā
ā ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
Detailed Explanation
Snowflake is a cloud data platform that provides separation of storage and compute, enabling flexible scaling and cost management.
Snowflake + dbt Features
- Warehouse Management: Dynamic compute allocation
- Time Travel: Historical data access
- Zero-Copy Cloning: Instant table copies
- Result Caching: Automatic query result caching
- Auto-Scaling: Dynamic cluster scaling
Warehouse Configuration
Snowflake warehouses control compute resources:
- Size: X-Small to 6X-Large
- Scaling: Single to multi-cluster
- Auto-suspend: Automatic suspension after inactivity
- Auto-resume: Automatic resume on query submission
Time Travel
Snowflake retains historical data for:
- Standard: 1 day
- Enterprise: 90 days
- Business Critical: 90 days
Best Practices for Snowflake
- Right-size warehouses - Match size to workload
- Use auto-scaling - Handle queue spikes
- Set auto-suspend - Reduce idle costs
- Leverage time travel - For debugging and auditing
- Use zero-copy cloning - For testing and development
- Monitor credit usage - Track costs
- Use query history - Analyze query patterns
- Implement resource monitors - Control costs
Code Examples
Snowflake Profile Configuration
# profiles.yml
my_profile:
target: dev
outputs:
dev:
type: snowflake
account: my_account
user: my_user
password: "{{ env_var('SNOWFLAKE_PASSWORD') }}"
role: TRANSFORMER
warehouse: COMPUTE_WH
database: ANALYTICS_DEV
schema: "dbt_{{ env_var('DBT_USER') }}"
client_session_keep_alive: false
query_tag: "dbt_dev"
connect_retries: 3
connect_timeout: 10
retry_on_database_errors: true
retry_all: false
prod:
type: snowflake
account: my_account
user: service_account
password: "{{ env_var('SNOWFLAKE_PASSWORD') }}"
role: TRANSFORMER
warehouse: ANALYTICS_WH
database: ANALYTICS_PROD
schema: public
client_session_keep_alive: true
query_tag: "dbt_production"
Warehouse Configuration
-- macros/snowflake/create_warehouse.sql
{% macro create_warehouse(warehouse_name, size='X-Small', min_clusters=1, max_clusters=1) %}
create warehouse if not exists {{ warehouse_name }}
warehouse_size = '{{ size }}'
min_cluster_count = {{ min_clusters }}
max_cluster_count = {{ max_clusters }}
auto_suspend = 60
auto_resume = true
initially_suspended = true
comment = 'Warehouse created by dbt';
{% endmacro %}
Time Travel Query
-- models/marts/fct_orders_historical.sql
{{
config(
materialized='view'
)
}}
-- Query data from 7 days ago
with current_data as (
select * from {{ ref('fct_orders') }}
),
historical_data as (
select * from {{ ref('fct_orders') }}
at (offset => -60*60*24*7) -- 7 days ago
),
comparison as (
select
current_data.order_id,
current_data.amount as current_amount,
historical_data.amount as historical_amount,
current_data.amount - historical_data.amount as amount_change
from current_data
left join historical_data
on current_data.order_id = historical_data.order_id
)
select * from comparison
Zero-Copy Clone
-- macros/snowflake/clone_table.sql
{% macro clone_table(source_table, target_table) %}
create table {{ target_table }} clone {{ source_table }};
{% endmacro %}
-- Usage in model
{{
config(
materialized='table',
post_hook=[
"{{ clone_table(this, this ~ '_backup') }}"
]
)
}}
Resource Monitor
-- macros/snowflake/create_resource_monitor.sql
{% macro create_resource_monitor(monitor_name, credit_quota=100, notify_emails=['data-eng@company.com']) %}
create resource monitor if not exists {{ monitor_name }}
with credit_quota = {{ credit_quota }}
notify_users = ({{ notify_emails | join(', ') }})
notify_on = (80, 95, 100)
suspend = immediately
suspend_immediately = true;
{% endmacro %}
Optimized Snowflake Model
-- models/marts/fct_orders.sql
{{
config(
materialized='incremental',
unique_key='order_id',
incremental_strategy='merge',
cluster_by=['customer_id', 'order_status'],
post_hook=[
"alter table {{ this }} clustering key recluster"
]
)
}}
with orders as (
select * from {{ ref('stg_orders') }}
),
final as (
select
order_id,
customer_id,
order_date,
order_status,
amount,
current_timestamp() as updated_at
from orders
)
select * from final
{% if is_incremental() %}
where updated_at > (select max(updated_at) from {{ this }})
{% endif %}
Performance Metrics
| Metric | Description | Target |
|---|---|---|
| Query Time | Average query execution | <10s |
| Credit Usage | Credits consumed per run | <50 |
| Warehouse Uptime | Percentage of time active | <50% |
| Cache Hit Rate | Query result cache hits | >80% |
| Auto-Suspend | Time to suspend idle | 60s |
Best Practices
- Right-size warehouses - Match size to workload
- Use auto-scaling - Handle queue spikes
- Set auto-suspend - Reduce idle costs
- Leverage time travel - For debugging and auditing
- Use zero-copy cloning - For testing and development
- Monitor credit usage - Track costs
- Use query history - Analyze query patterns
- Implement resource monitors - Control costs