CW

dbt Semantic Layer and Metrics

Free Lesson

Advertisement

dbt Semantic Layer and Metrics

Semantic Layer Architecture

Semantic Layer Pipeline

Formal Definitions

DfMetric

A metric is a named, reusable business calculation defined in YAML that quantifies a business process. Metrics are built on top of semantic models and consist of: a name, type (simple, derived, or cumulative), an expression (the calculation), and a list of input measures/dimensions. Formally, a metric M = {name, type, expression, inputs, filters, description}.

DfSemantic Model

A semantic model is a YAML definition that maps a dbt model to the semantic layer. It defines: the underlying table, entity columns (primary/foreign keys), dimension columns (attributes for grouping), and measure columns (numeric aggregations). Semantic models bridge dbt models to the metric system, enabling consistent metric definitions.

DfMetricFlow

MetricFlow is dbt's semantic query engine that compiles metric definitions into SQL queries. It handles metric resolution, join logic, aggregation, and time-spooling. When you query a metric, MetricFlow generates optimized SQL that joins the necessary semantic models and applies the correct aggregations and filters.

DfMeasure

A measure is a numeric column in a semantic model that can be aggregated. Measures are the building blocks of metrics. Common aggregations include: sum, count, count_distinct, average, min, max, and percentiles. A measure is defined with a name, aggregation type, and the underlying column.

Detailed Explanation

The dbt Semantic Layer provides a centralized system for defining business metrics, ensuring consistent definitions across all analytics consumers. It eliminates metric inconsistencies by defining calculations once and serving them to multiple downstream tools.

Metric Types

TypeDescriptionExample
SimpleDirect aggregation of a measureTotal Revenue
DerivedCalculation from other metricsProfit Margin
CumulativeRunning total over timeCumulative Revenue
ConversionFunnel-based metricsConversion Rate

The Semantic Layer requires dbt Cloud with the Semantic Layer feature enabled, or dbt Core with MetricFlow. Metrics are defined in YAML files under the metrics/ directory or within model YAML files using the metrics key.

Start with simple metrics (sum, count) and gradually build derived metrics. Always define semantic models first, then build metrics on top. Use consistent naming conventions: snake_case for metric names, prefixes for metric types (e.g., total_, avg_, count_).

Code Examples

Semantic Model Definition

# models/marts/fct_orders.yml
version: 2

semantic_models:
  - name: orders
    defaults:
      agg_time_dimension: order_date
    description: "Order transactions fact table"
    model: ref('fct_orders')
    
    entities:
      - name: order_id
        type: primary
      - name: customer_id
        type: foreign
      - name: product_id
        type: foreign
    
    dimensions:
      - name: order_date
        type: time
        type_params:
          time_granularity: day
      - name: order_status
        type: categorical
      - name: customer_segment
        type: categorical
      - name: product_category
        type: categorical
    
    measures:
      - name: order_count
        agg: count
        expr: order_id
        description: "Count of orders"
      
      - name: revenue
        agg: sum
        expr: amount
        description: "Total order amount"
      
      - name: unique_customers
        agg: count_distinct
        expr: customer_id
        description: "Distinct customers who placed orders"

Simple Metric

# metrics/order_metrics.yml
version: 2

metrics:
  - name: total_revenue
    type: simple
    type_params:
      measure: revenue
    description: "Total revenue from all orders"
    filter: |
      {{ Dimension('order__order_status') }} != 'cancelled'
    label: "Total Revenue"
    round: 2
  
  - name: total_orders
    type: simple
    type_params:
      measure: order_count
    description: "Total number of orders"
    label: "Total Orders"
  
  - name: unique_customers
    type: simple
    type_params:
      measure: unique_customers
    description: "Number of unique customers"
    label: "Unique Customers"

Derived Metric

# metrics/order_metrics.yml
metrics:
  - name: average_order_value
    type: derived
    type_params:
      expr: total_revenue / total_orders
      metrics:
        - name: total_revenue
          filter: |
            {{ Dimension('order__order_status') }} != 'cancelled'
        - name: total_orders
          filter: |
            {{ Dimension('order__order_status') }} != 'cancelled'
    description: "Average revenue per order"
    label: "Average Order Value"
    round: 2
  
  - name: revenue_per_customer
    type: derived
    type_params:
      expr: total_revenue / unique_customers
      metrics:
        - name: total_revenue
        - name: unique_customers
    description: "Revenue per unique customer"
    label: "Revenue per Customer"
    round: 2

Cumulative Metric

# metrics/cumulative_metrics.yml
metrics:
  - name: cumulative_revenue
    type: cumulative
    type_params:
      measure: revenue
      window: 7 day
      grain_to_date: month
    description: "7-day rolling cumulative revenue"
    label: "Cumulative Revenue (7-day)"
    round: 2
  
  - name: cumulative_orders
    type: cumulative
    type_params:
      measure: order_count
      window: 30 day
    description: "30-day rolling cumulative orders"
    label: "Cumulative Orders (30-day)"

Conversion Metric

# metrics/conversion_metrics.yml
metrics:
  - name: conversion_rate
    type: conversion
    type_params:
      numerator: completed_orders
      denominator: total_orders
      window: 7 day
    description: "Order completion rate within 7 days"
    label: "Conversion Rate"
    round: 4
  
  - name: completed_orders
    type: simple
    type_params:
      measure: order_count
    filter: |
      {{ Dimension('order__order_status') }} = 'completed'
  
  - name: total_orders
    type: simple
    type_params:
      measure: order_count

Querying Metrics

-- Query via MetricFlow SQL
SELECT
    order_date,
    customer_segment,
    {{ metric('total_revenue') }},
    {{ metric('total_orders') }},
    {{ metric('average_order_value') }}
FROM {{ metric('total_revenue') }}
GROUP BY 1, 2
ORDER BY 1, 2

Using the Semantic Layer API

# Query via Python API
from dbt_semantic_interfaces.protocols.semantic_manifest import SemanticManifest

# Connect to the Semantic Layer
semantic_layer = SemanticLayerClient(
    host="https://semantic-layer.getdbt.com",
    auth_token="your_token",
    environment_id="123"
)

# Query metrics
result = semantic_layer.query(
    metrics=["total_revenue", "total_orders"],
    group_by=["order_date", "customer_segment"],
    where="order_status != 'cancelled'",
    order_by=["order_date"]
)

# Process results
for row in result.data:
    print(f"Date: {row['order_date']}, Segment: {row['customer_segment']}, "
          f"Revenue: ${row['total_revenue']:,.2f}, Orders: {row['total_orders']}")

Semantic Model with Multiple Measures

# models/marts/fct_events.yml
semantic_models:
  - name: events
    model: ref('fct_events')
    
    entities:
      - name: event_id
        type: primary
      - name: user_id
        type: foreign
      - name: session_id
        type: foreign
    
    dimensions:
      - name: event_date
        type: time
        type_params:
          time_granularity: day
      - name: event_type
        type: categorical
      - name: platform
        type: categorical
    
    measures:
      - name: event_count
        agg: count
        expr: event_id
      
      - name: unique_users
        agg: count_distinct
        expr: user_id
      
      - name: session_count
        agg: count_distinct
        expr: session_id
      
      - name: avg_session_duration
        agg: average
        expr: session_duration_seconds
      
      - name: total_page_views
        agg: sum
        expr: page_views

Comparison: Metrics Approaches

Aspectdbt Semantic LayerCustom SQL MetricsBI Tool Metrics
CentralizationCentralizedDistributedTool-specific
ConsistencyGuaranteedManualLimited
ReusabilityHighLowMedium
Version ControlYesYesVaries
Cross-ToolYesNoNo
PerformanceOptimizedManualTool-dependent
GovernanceBuilt-inManualLimited

Performance Metrics

MetricDescriptionTarget
Query LatencyTime to return metric results< 5 seconds
Metric CompilationTime to generate SQL< 1 second
Semantic Model ValidationTime to validate definitions< 10 seconds
Metric CoveragePercentage of metrics defined> 80%
Consistency ScoreMetrics matching across tools100%

Best Practices

  1. Define semantic models first - Map all dbt models to semantic models
  2. Use consistent naming - snake_case for all metric names
  3. Document everything - Descriptions for all metrics and dimensions
  4. Start simple - Begin with simple metrics, build complexity gradually
  5. Validate definitions - Use dbt ls --resource-type metric to verify
  6. Version control - Track metric changes in Git
  7. Test metrics - Validate metric values against known baselines
  8. Monitor performance - Track query latency and optimize as needed

See Also

Advertisement

Need Expert dbt Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement