dbt Semantic Layer and Metrics
Semantic Layer Architecture
Semantic Layer Pipeline
Formal Definitions
DfMetric
A metric is a named, reusable business calculation defined in YAML that quantifies a business process. Metrics are built on top of semantic models and consist of: a name, type (simple, derived, or cumulative), an expression (the calculation), and a list of input measures/dimensions. Formally, a metric M = {name, type, expression, inputs, filters, description}.
DfSemantic Model
A semantic model is a YAML definition that maps a dbt model to the semantic layer. It defines: the underlying table, entity columns (primary/foreign keys), dimension columns (attributes for grouping), and measure columns (numeric aggregations). Semantic models bridge dbt models to the metric system, enabling consistent metric definitions.
DfMetricFlow
MetricFlow is dbt's semantic query engine that compiles metric definitions into SQL queries. It handles metric resolution, join logic, aggregation, and time-spooling. When you query a metric, MetricFlow generates optimized SQL that joins the necessary semantic models and applies the correct aggregations and filters.
DfMeasure
A measure is a numeric column in a semantic model that can be aggregated. Measures are the building blocks of metrics. Common aggregations include: sum, count, count_distinct, average, min, max, and percentiles. A measure is defined with a name, aggregation type, and the underlying column.
Detailed Explanation
The dbt Semantic Layer provides a centralized system for defining business metrics, ensuring consistent definitions across all analytics consumers. It eliminates metric inconsistencies by defining calculations once and serving them to multiple downstream tools.
Metric Types
| Type | Description | Example |
|---|---|---|
| Simple | Direct aggregation of a measure | Total Revenue |
| Derived | Calculation from other metrics | Profit Margin |
| Cumulative | Running total over time | Cumulative Revenue |
| Conversion | Funnel-based metrics | Conversion Rate |
The Semantic Layer requires dbt Cloud with the Semantic Layer feature enabled, or dbt Core with MetricFlow. Metrics are defined in YAML files under the metrics/ directory or within model YAML files using the metrics key.
Start with simple metrics (sum, count) and gradually build derived metrics. Always define semantic models first, then build metrics on top. Use consistent naming conventions: snake_case for metric names, prefixes for metric types (e.g., total_, avg_, count_).
Code Examples
Semantic Model Definition
# models/marts/fct_orders.yml
version: 2
semantic_models:
- name: orders
defaults:
agg_time_dimension: order_date
description: "Order transactions fact table"
model: ref('fct_orders')
entities:
- name: order_id
type: primary
- name: customer_id
type: foreign
- name: product_id
type: foreign
dimensions:
- name: order_date
type: time
type_params:
time_granularity: day
- name: order_status
type: categorical
- name: customer_segment
type: categorical
- name: product_category
type: categorical
measures:
- name: order_count
agg: count
expr: order_id
description: "Count of orders"
- name: revenue
agg: sum
expr: amount
description: "Total order amount"
- name: unique_customers
agg: count_distinct
expr: customer_id
description: "Distinct customers who placed orders"
Simple Metric
# metrics/order_metrics.yml
version: 2
metrics:
- name: total_revenue
type: simple
type_params:
measure: revenue
description: "Total revenue from all orders"
filter: |
{{ Dimension('order__order_status') }} != 'cancelled'
label: "Total Revenue"
round: 2
- name: total_orders
type: simple
type_params:
measure: order_count
description: "Total number of orders"
label: "Total Orders"
- name: unique_customers
type: simple
type_params:
measure: unique_customers
description: "Number of unique customers"
label: "Unique Customers"
Derived Metric
# metrics/order_metrics.yml
metrics:
- name: average_order_value
type: derived
type_params:
expr: total_revenue / total_orders
metrics:
- name: total_revenue
filter: |
{{ Dimension('order__order_status') }} != 'cancelled'
- name: total_orders
filter: |
{{ Dimension('order__order_status') }} != 'cancelled'
description: "Average revenue per order"
label: "Average Order Value"
round: 2
- name: revenue_per_customer
type: derived
type_params:
expr: total_revenue / unique_customers
metrics:
- name: total_revenue
- name: unique_customers
description: "Revenue per unique customer"
label: "Revenue per Customer"
round: 2
Cumulative Metric
# metrics/cumulative_metrics.yml
metrics:
- name: cumulative_revenue
type: cumulative
type_params:
measure: revenue
window: 7 day
grain_to_date: month
description: "7-day rolling cumulative revenue"
label: "Cumulative Revenue (7-day)"
round: 2
- name: cumulative_orders
type: cumulative
type_params:
measure: order_count
window: 30 day
description: "30-day rolling cumulative orders"
label: "Cumulative Orders (30-day)"
Conversion Metric
# metrics/conversion_metrics.yml
metrics:
- name: conversion_rate
type: conversion
type_params:
numerator: completed_orders
denominator: total_orders
window: 7 day
description: "Order completion rate within 7 days"
label: "Conversion Rate"
round: 4
- name: completed_orders
type: simple
type_params:
measure: order_count
filter: |
{{ Dimension('order__order_status') }} = 'completed'
- name: total_orders
type: simple
type_params:
measure: order_count
Querying Metrics
-- Query via MetricFlow SQL
SELECT
order_date,
customer_segment,
{{ metric('total_revenue') }},
{{ metric('total_orders') }},
{{ metric('average_order_value') }}
FROM {{ metric('total_revenue') }}
GROUP BY 1, 2
ORDER BY 1, 2
Using the Semantic Layer API
# Query via Python API
from dbt_semantic_interfaces.protocols.semantic_manifest import SemanticManifest
# Connect to the Semantic Layer
semantic_layer = SemanticLayerClient(
host="https://semantic-layer.getdbt.com",
auth_token="your_token",
environment_id="123"
)
# Query metrics
result = semantic_layer.query(
metrics=["total_revenue", "total_orders"],
group_by=["order_date", "customer_segment"],
where="order_status != 'cancelled'",
order_by=["order_date"]
)
# Process results
for row in result.data:
print(f"Date: {row['order_date']}, Segment: {row['customer_segment']}, "
f"Revenue: ${row['total_revenue']:,.2f}, Orders: {row['total_orders']}")
Semantic Model with Multiple Measures
# models/marts/fct_events.yml
semantic_models:
- name: events
model: ref('fct_events')
entities:
- name: event_id
type: primary
- name: user_id
type: foreign
- name: session_id
type: foreign
dimensions:
- name: event_date
type: time
type_params:
time_granularity: day
- name: event_type
type: categorical
- name: platform
type: categorical
measures:
- name: event_count
agg: count
expr: event_id
- name: unique_users
agg: count_distinct
expr: user_id
- name: session_count
agg: count_distinct
expr: session_id
- name: avg_session_duration
agg: average
expr: session_duration_seconds
- name: total_page_views
agg: sum
expr: page_views
Comparison: Metrics Approaches
| Aspect | dbt Semantic Layer | Custom SQL Metrics | BI Tool Metrics |
|---|---|---|---|
| Centralization | Centralized | Distributed | Tool-specific |
| Consistency | Guaranteed | Manual | Limited |
| Reusability | High | Low | Medium |
| Version Control | Yes | Yes | Varies |
| Cross-Tool | Yes | No | No |
| Performance | Optimized | Manual | Tool-dependent |
| Governance | Built-in | Manual | Limited |
Performance Metrics
| Metric | Description | Target |
|---|---|---|
| Query Latency | Time to return metric results | < 5 seconds |
| Metric Compilation | Time to generate SQL | < 1 second |
| Semantic Model Validation | Time to validate definitions | < 10 seconds |
| Metric Coverage | Percentage of metrics defined | > 80% |
| Consistency Score | Metrics matching across tools | 100% |
Best Practices
- Define semantic models first - Map all dbt models to semantic models
- Use consistent naming - snake_case for all metric names
- Document everything - Descriptions for all metrics and dimensions
- Start simple - Begin with simple metrics, build complexity gradually
- Validate definitions - Use
dbt ls --resource-type metricto verify - Version control - Track metric changes in Git
- Test metrics - Validate metric values against known baselines
- Monitor performance - Track query latency and optimize as needed
See Also
- Exposures and Semantic Layer — Defining downstream consumers
- dbt Projects — Project configuration
- dbt Best Practices — Documentation patterns
- dbt Cloud — Semantic Layer in dbt Cloud