dbt Project Configuration
Project Architecture
Architecture Diagram
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā DBT PROJECT ARCHITECTURE ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¤
ā ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā PROJECT COMPONENTS ā ā
ā ā ā ā
ā ā āāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāāāāāāāāāāāāāā ā ā
ā ā ā DBT_PROJECT ā ā PACKAGES ā ā PROFILES ā ā ā
ā ā ā YML ā ā YML ā ā YML ā ā ā
ā ā ā ā ā ā ā ā ā ā
ā ā ā ⢠Project ā ā ⢠Package ā ā ⢠Connection ā ā ā
ā ā ā settings ā ā versions ā ā details ā ā ā
ā ā ā ⢠Model ā ā ⢠Git refs ā ā ⢠Target ā ā ā
ā ā ā configs ā ā ⢠Registries ā ā environment ā ā ā
ā ā ā ⢠Variables ā ā ā ā ⢠Schema ā ā ā
ā ā āāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāāāāāāāāāāāāāā ā ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā ā
ā ā¼ ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā DIRECTORY STRUCTURE ā ā
ā ā ā ā
ā ā my_project/ ā ā
ā ā āāā dbt_project.yml āāā Project config ā ā
ā ā āāā packages.yml āāā Package dependencies ā ā
ā ā āāā profiles.yml āāā Connection profiles ā ā
ā ā āāā models/ āāā SQL/Python models ā ā
ā ā āāā seeds/ āāā CSV seed data ā ā
ā ā āāā snapshots/ āāā SCD snapshots ā ā
ā ā āāā tests/ āāā Custom data tests ā ā
ā ā āāā macros/ āāā Reusable macros ā ā
ā ā āāā analysis/ āāā Ad-hoc analysis ā ā
ā ā āāā dbt_packages/ āāā Installed packages ā ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
Dependency Management
Architecture Diagram
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā DEPENDENCY RESOLUTION FLOW ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¤
ā ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā PACKAGES.YML ā ā
ā ā ā ā
ā ā packages: ā ā
ā ā - package: dbt-labs/dbt_utils ā ā
ā ā version: [">=1.0.0", "<2.0.0"] ā ā
ā ā - package: calogica/dbt_expectations ā ā
ā ā version: [">=0.10.0"] ā ā
ā ā - git: "https://github.com/org/custom_package.git" ā ā
ā ā revision: main ā ā
ā ā - registry: dbt-labs/metrics ā ā
ā ā version: [">=0.3.0"] ā ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā ā
ā ā¼ ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā DBT DEPS RESOLUTION ā ā
ā ā ā ā
ā ā 1. Parse packages.yml ā ā
ā ā 2. Resolve version constraints ā ā
ā ā 3. Download packages ā ā
ā ā 4. Install to dbt_packages/ ā ā
ā ā 5. Build dependency graph ā ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā ā
ā ā¼ ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā INSTALLED PACKAGES ā ā
ā ā ā ā
ā ā dbt_packages/ ā ā
ā ā āāā dbt_utils/ ā ā
ā ā ā āāā macros/ ā ā
ā ā ā āāā dbt_project.yml ā ā
ā ā āāā dbt_expectations/ ā ā
ā ā ā āāā macros/ ā ā
ā ā ā āāā dbt_project.yml ā ā
ā ā āāā custom_package/ ā ā
ā ā āāā macros/ ā ā
ā ā āāā dbt_project.yml ā ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
Environment Configuration
Architecture Diagram
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā ENVIRONMENT MANAGEMENT ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¤
ā ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā PROFILE CONFIGURATION ā ā
ā ā ā ā
ā ā my_profile: ā ā
ā ā target: dev ā ā
ā ā outputs: ā ā
ā ā dev: ā ā
ā ā type: snowflake ā ā
ā ā account: my_account ā ā
ā ā user: my_user ā ā
ā ā password: "{{ env_var('DBT_PASSWORD') }}" ā ā
ā ā warehouse: compute_wh ā ā
ā ā database: analytics_dev ā ā
ā ā schema: dbt_{{ env_var('DBT_USER') }} ā ā
ā ā ā ā
ā ā prod: ā ā
ā ā type: snowflake ā ā
ā ā account: my_account ā ā
ā ā user: service_account ā ā
ā ā password: "{{ env_var('DBT_PROD_PASSWORD') }}" ā ā
ā ā warehouse: analytics_wh ā ā
ā ā database: analytics_prod ā ā
ā ā schema: public ā ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā ā
ā ā¼ ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā ENVIRONMENT VARIABLES ā ā
ā ā ā ā
ā ā āāāāāāāāāāāāāāāāāāā¬āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā ā
ā ā ā Variable ā Purpose ā ā ā
ā ā āāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā⤠ā ā
ā ā ā DBT_USER ā Current user for dev schema ā ā ā
ā ā ā DBT_PASSWORD ā Database password ā ā ā
ā ā ā DBT_ENV ā Current environment (dev/prod) ā ā ā
ā ā ā DBT_HOST ā Database host ā ā ā
ā ā ā DBT_WAREHOUSE ā Compute warehouse ā ā ā
ā ā āāāāāāāāāāāāāāāāāāā“āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā ā
ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā
ā ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
Detailed Explanation
dbt projects are the fundamental organizational unit in dbt. They contain all your models, tests, macros, and configurations.
Project Configuration
The dbt_project.yml file is the central configuration file for your dbt project:
- Project metadata: Name, version, profile
- Model paths: Where to find models, seeds, tests
- Model configurations: Default materializations, schemas, tags
- Variables: Custom variables for dynamic configuration
- Clean targets: Files to remove during cleanup
Package Management
Packages are reusable collections of macros, models, and tests:
- dbt-labs/dbt_utils: Core utility functions
- calogica/dbt_expectations: Advanced testing
- dbt-labs/codegen: Code generation
- Custom packages: Organization-specific code
Profile Configuration
Profiles define how dbt connects to your data warehouse:
- Target environments: Dev, staging, production
- Connection details: Account, user, password
- Warehouse settings: Size, cluster, timeouts
- Schema configuration: Dynamic schemas per user
Environment Variables
Environment variables allow dynamic configuration:
- Secrets: Passwords, tokens (never commit to Git)
- Environment-specific: Different settings per target
- User-specific: Per-developer configurations
- CI/CD: Pipeline-specific settings
Project Best Practices
- Use version control for all configuration files
- Never commit secrets - use environment variables
- Separate environments - dev, staging, production
- Document configurations - add comments and descriptions
- Test configurations - validate before deployment
- Use packages - leverage community code
- Version constraints - specify version ranges for packages
- Clean regularly - remove unused targets and packages
Code Examples
dbt_project.yml
# dbt_project.yml
name: 'my_analytics_project'
version: '1.0.0'
config-version: 2
profile: 'analytics'
model-paths: ["models"]
analysis-paths: ["analysis"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]
docs-paths: ["docs"]
clean-targets:
- "target"
- "dbt_packages"
- "dbt_modules"
models:
my_analytics_project:
staging:
+materialized: view
+schema: staging
+tags: ['staging']
intermediate:
+materialized: ephemeral
+tags: ['intermediate']
marts:
+materialized: incremental
+schema: analytics
+tags: ['mart', 'production']
finance:
+cluster_by: ['date', 'account_id']
marketing:
+partition_by: {
"field": "event_date",
"data_type": "date"
}
vars:
start_date: '2020-01-01'
enable_audit: true
default_currency: 'USD'
query-comment:
comment: "dbt: {{ node.unique_id }} | {{ node.description }}"
append: true
packages.yml
# packages.yml
packages:
- package: dbt-labs/dbt_utils
version: [">=1.0.0", "<2.0.0"]
- package: calogica/dbt_expectations
version: [">=0.10.0", "<1.0.0"]
- package: dbt-labs/codegen
version: [">=0.12.0"]
- package: elementary-data/elementary
version: [">=0.14.0"]
- git: "https://github.com/my-org/custom_dbt_package.git"
revision: main
- registry: dbt-labs/metrics
version: [">=0.3.0"]
profiles.yml
# profiles.yml
my_profile:
target: dev
outputs:
dev:
type: snowflake
account: "{{ env_var('SNOWFLAKE_ACCOUNT') }}"
user: "{{ env_var('SNOWFLAKE_USER') }}"
password: "{{ env_var('SNOWFLAKE_PASSWORD') }}"
role: TRANSFORMER
database: ANALYTICS_DEV
warehouse: COMPUTE_WH
schema: "dbt_{{ env_var('DBT_USER') }}"
client_session_keep_alive: false
query_tag: "dbt_dev"
staging:
type: snowflake
account: "{{ env_var('SNOWFLAKE_ACCOUNT') }}"
user: "{{ env_var('SNOWFLAKE_USER') }}"
password: "{{ env_var('SNOWFLAKE_PASSWORD') }}"
role: TRANSFORMER
database: ANALYTICS_STG
warehouse: ANALYTICS_WH
schema: public
client_session_keep_alive: true
query_tag: "dbt_staging"
prod:
type: snowflake
account: "{{ env_var('SNOWFLAKE_ACCOUNT') }}"
user: "{{ env_var('SNOWFLAKE_USER') }}"
password: "{{ env_var('SNOWFLAKE_PASSWORD') }}"
role: TRANSFORMER
database: ANALYTICS_PROD
warehouse: ANALYTICS_WH
schema: public
client_session_keep_alive: true
query_tag: "dbt_production"
Model Configuration
# models/marts/fct_orders.yml
version: 2
models:
- name: fct_orders
description: "Fact table for orders"
config:
materialized: incremental
unique_key: order_id
incremental_strategy: merge
partition_by: {
"field": "order_date",
"data_type": "date"
}
cluster_by: ['customer_id', 'status']
tags: ['finance', 'core', 'production']
meta:
owner: data-engineering
team: analytics
cost_center: finance
pii: false
columns:
- name: order_id
description: "Unique order identifier"
data_tests:
- unique
- not_null
Performance Metrics
| Component | Description | Impact |
|---|---|---|
| dbt_project.yml | Project configuration | High |
| packages.yml | Package dependencies | Medium |
| profiles.yml | Connection settings | High |
| Variables | Dynamic configuration | Low |
| Model configs | Model settings | High |
| Tags | Organization | Low |
Best Practices
- Use version control for all configuration
- Never commit secrets - use environment variables
- Separate environments - dev, staging, production
- Document configurations - add comments
- Test configurations - validate before deployment
- Use packages - leverage community code
- Version constraints - specify version ranges
- Clean regularly - remove unused targets