CW

Snowflake Data Lineage Tracking

Free Lesson

Advertisement

Snowflake Data Lineage Tracking

Data Lineage in Snowflake tracks the flow of data from source to destination, providing visibility into transformations, dependencies, and impact for governance and compliance.

Architecture Overview

<svg width="800" height="450" viewBox="0 0 800 450" xmlns="http://www.w3.org/2000/svg">
  <defs>
    <linearGradient id="lineGrad" x1="0%" y1="0%" x2="100%" y2="0%">
      <stop offset="0%" style="stop-color:#3498DB;stop-opacity:1" />
      <stop offset="100%" style="stop-color:#5DADE2;stop-opacity:1" />
    </linearGradient>
    <linearGradient id="metaGrad" x1="0%" y1="0%" x2="100%" y2="0%">
      <stop offset="0%" style="stop-color:#9B59B6;stop-opacity:1" />
      <stop offset="100%" style="stop-color:#AF7AC5;stop-opacity:1" />
    </linearGradient>
  </defs>

  <text x="400" y="30" text-anchor="middle" font-size="18" font-weight="bold" fill="#333">Snowflake Data Lineage Architecture</text>
  <rect x="30" y="60" width="120" height="150" rx="10" fill="#6C5CE7" opacity="0.9"/>
  <text x="90" y="85" text-anchor="middle" font-size="12" fill="white" font-weight="bold">Sources</text>
  <text x="90" y="110" text-anchor="middle" font-size="10" fill="white">Raw Tables</text>
  <text x="90" y="125" text-anchor="middle" font-size="10" fill="white">External Data</text>
  <text x="90" y="140" text-anchor="middle" font-size="10" fill="white">APIs</text>
  <text x="90" y="155" text-anchor="middle" font-size="10" fill="white">Files</text>
  <text x="90" y="175" text-anchor="middle" font-size="10" fill="white">Streams</text>
  <path d="M150 135 L190 135" stroke="#333" stroke-width="2" fill="none" marker-end="url(#arrowLine)"/>
  <rect x="190" y="60" width="150" height="150" rx="10" fill="url(#lineGrad)" opacity="0.9"/>
  <text x="265" y="85" text-anchor="middle" font-size="12" fill="white" font-weight="bold">Transform</text>
  <text x="265" y="110" text-anchor="middle" font-size="10" fill="white">ETL Jobs</text>
  <text x="265" y="125" text-anchor="middle" font-size="10" fill="white">Stored Procs</text>
  <text x="265" y="140" text-anchor="middle" font-size="10" fill="white">Tasks</text>
  <text x="265" y="155" text-anchor="middle" font-size="10" fill="white">Views</text>
  <text x="265" y="175" text-anchor="middle" font-size="10" fill="white">UDFs</text>
  <path d="M340 135 L380 135" stroke="#333" stroke-width="2" fill="none" marker-end="url(#arrowLine)"/>
  <rect x="380" y="60" width="130" height="150" rx="10" fill="#F39C12" opacity="0.9"/>
  <text x="445" y="85" text-anchor="middle" font-size="12" fill="white" font-weight="bold">Staging</text>
  <text x="445" y="110" text-anchor="middle" font-size="10" fill="white">Curated</text>
  <text x="445" y="125" text-anchor="middle" font-size="10" fill="white">Conformed</text>
  <text x="445" y="140" text-anchor="middle" font-size="10" fill="white">Enriched</text>
  <text x="445" y="155" text-anchor="middle" font-size="10" fill="white">Aggregated</text>
  <text x="445" y="175" text-anchor="middle" font-size="10" fill="white">Validated</text>
  <path d="M510 135 L550 135" stroke="#333" stroke-width="2" fill="none" marker-end="url(#arrowLine)"/>
  <rect x="550" y="60" width="120" height="150" rx="10" fill="#2ECC71" opacity="0.9"/>
  <text x="610" y="85" text-anchor="middle" font-size="12" fill="white" font-weight="bold">Serving</text>
  <text x="610" y="110" text-anchor="middle" font-size="10" fill="white">Data Marts</text>
  <text x="610" y="125" text-anchor="middle" font-size="10" fill="white">Analytics</text>
  <text x="610" y="140" text-anchor="middle" font-size="10" fill="white">ML Features</text>
  <text x="610" y="155" text-anchor="middle" font-size="10" fill="white">Reports</text>
  <text x="610" y="175" text-anchor="middle" font-size="10" fill="white">APIs</text>
  <path d="M670 135 L710 135" stroke="#333" stroke-width="2" fill="none" marker-end="url(#arrowLine)"/>

  <rect x="710" y="60" width="70" height="150" rx="10" fill="#E74C3C" opacity="0.9"/>
  <text x="745" y="85" text-anchor="middle" font-size="12" fill="white" font-weight="bold">Target</text>
  <text x="745" y="110" text-anchor="middle" font-size="10" fill="white">Users</text>
  <text x="745" y="125" text-anchor="middle" font-size="10" fill="white">Apps</text>
  <text x="745" y="140" text-anchor="middle" font-size="10" fill="white">ML</text>
  <text x="745" y="155" text-anchor="middle" font-size="10" fill="white">Exports</text>
  <rect x="30" y="230" width="740" height="100" rx="10" fill="url(#metaGrad)" opacity="0.9"/>
  <text x="400" y="255" text-anchor="middle" font-size="14" fill="white" font-weight="bold">Lineage Metadata</text>

  <rect x="50" y="270" width="140" height="45" rx="8" fill="white" opacity="0.9"/>
  <text x="120" y="297" text-anchor="middle" font-size="10" fill="#333">Object Dependencies</text>

  <rect x="210" y="270" width="140" height="45" rx="8" fill="white" opacity="0.9"/>
  <text x="280" y="297" text-anchor="middle" font-size="10" fill="#333">Column Lineage</text>

  <rect x="370" y="270" width="140" height="45" rx="8" fill="white" opacity="0.9"/>
  <text x="440" y="297" text-anchor="middle" font-size="10" fill="#333">Transformation Type</text>

  <rect x="530" y="270" width="140" height="45" rx="8" fill="white" opacity="0.9"/>
  <text x="600" y="297" text-anchor="middle" font-size="10" fill="#333">Query Text</text>

  <rect x="690" y="270" width="70" height="45" rx="8" fill="white" opacity="0.9"/>
  <text x="725" y="297" text-anchor="middle" font-size="10" fill="#333">Timestamp</text>
  <rect x="30" y="350" width="180" height="80" rx="10" fill="#27AE60" opacity="0.85"/>
  <text x="120" y="375" text-anchor="middle" font-size="12" fill="white" font-weight="bold">Impact Analysis</text>
  <text x="120" y="395" text-anchor="middle" font-size="10" fill="white">Find downstream effects</text>
  <text x="120" y="410" text-anchor="middle" font-size="10" fill="white">Assess change risk</text>

  <rect x="230" y="350" width="180" height="80" rx="10" fill="#3498DB" opacity="0.85"/>
  <text x="320" y="375" text-anchor="middle" font-size="12" fill="white" font-weight="bold">Root Cause</text>
  <text x="320" y="395" text-anchor="middle" font-size="10" fill="white">Trace upstream sources</text>
  <text x="320" y="410" text-anchor="middle" font-size="10" fill="white">Identify data issues</text>

  <rect x="430" y="350" width="180" height="80" rx="10" fill="#F39C12" opacity="0.85"/>
  <text x="520" y="375" text-anchor="middle" font-size="12" fill="white" font-weight="bold">Compliance</text>
  <text x="520" y="395" text-anchor="middle" font-size="10" fill="white">Regulatory audits</text>
  <text x="520" y="410" text-anchor="middle" font-size="10" fill="white">Data privacy reports</text>

  <rect x="630" y="350" width="140" height="80" rx="10" fill="#9B59B6" opacity="0.85"/>
  <text x="700" y="375" text-anchor="middle" font-size="12" fill="white" font-weight="bold">Discovery</text>
  <text x="700" y="395" text-anchor="middle" font-size="10" fill="white">Data catalog</text>
  <text x="700" y="410" text-anchor="middle" font-size="10" fill="white">Asset search</text>

  <defs>
    <marker id="arrowLine" markerWidth="10" markerHeight="10" refX="9" refY="3" orient="auto" markerUnits="strokeWidth">
      <path d="M0,0 L0,6 L9,3 z" fill="#333"/>
    </marker>
  </defs>
</svg>

Key Concepts

DfData Lineage

DfImpact Analysis

Accessing Lineage

Object Lineage

-- Get lineage for a specific table
SELECT *
FROM TABLE(INFORMATION_SCHEMA.DATA_LINEAGE(
  DATABASE_NAME => 'my_database',
  SCHEMA_NAME => 'analytics',
  TABLE_NAME => 'fact_sales'
));

-- Upstream dependencies
SELECT *
FROM TABLE(INFORMATION_SCHEMA.DATA_LINEAGE(
  DATABASE_NAME => 'my_database',
  SCHEMA_NAME => 'analytics',
  TABLE_NAME => 'fact_sales',
  DIRECTION => 'UPSTREAM'
));

-- Downstream dependencies
SELECT *
FROM TABLE(INFORMATION_SCHEMA.DATA_LINEAGE(
  DATABASE_NAME => 'my_database',
  SCHEMA_NAME => 'analytics',
  TABLE_NAME => 'fact_sales',
  DIRECTION => 'DOWNSTREAM'
));

Column-Level Lineage

-- Column lineage
SELECT
  source_table,
  source_column,
  target_table,
  target_column,
  transformation_type,
  transformation_expression
FROM TABLE(INFORMATION_SCHEMA.DATA_LINEAGE(
  DATABASE_NAME => 'my_database',
  SCHEMA_NAME => 'analytics',
  TABLE_NAME => 'fact_sales',
  COLUMN_NAME => 'revenue'
));

Impact Analysis Queries

Change Impact Assessment

-- Find all objects affected by a table change
WITH RECURSIVE downstream AS (
  -- Base table
  SELECT
    table_name,
    0 as depth,
    table_name as root_table
  FROM information_schema.tables
  WHERE table_name = 'source_table'
  
  UNION ALL
  
  -- Direct dependents
  SELECT
    d.target_table,
    d.depth + 1,
    d.root_table
  FROM downstream d
  JOIN TABLE(INFORMATION_SCHEMA.DATA_LINEAGE(
    DIRECTION => 'DOWNSTREAM'
  )) l ON d.table_name = l.source_table
)
SELECT * FROM downstream
ORDER BY depth;

Data Quality Impact

-- Find objects with data quality issues
SELECT
  source_table,
  COUNT(DISTINCT target_table) as affected_objects,
  COUNT(DISTINCT target_column) as affected_columns
FROM TABLE(INFORMATION_SCHEMA.DATA_LINEAGE(
  DIRECTION => 'DOWNSTREAM'
))
WHERE source_table IN (
  SELECT table_name
  FROM information_schema.data_quality_metrics
  WHERE quality_score < 0.9
)
GROUP BY 1
ORDER BY affected_objects DESC;

Lineage Visualization

-- Generate lineage graph data
SELECT
  'node' as type,
  table_name as id,
  table_name as label,
  CASE
    WHEN table_type = 'BASE TABLE' THEN 'table'
    WHEN table_type = 'VIEW' THEN 'view'
    ELSE 'other'
  END as shape
FROM information_schema.tables
WHERE database_name = 'my_database'

UNION ALL

SELECT
  'edge' as type,
  source_table as id,
  target_table as label,
  transformation_type as shape
FROM TABLE(INFORMATION_SCHEMA.DATA_LINEAGE(
  DATABASE_NAME => 'my_database'
));

Data lineage is automatically captured by Snowflake for all DML operations. For complex pipelines, consider creating dedicated lineage views that aggregate metadata for visualization tools and impact analysis dashboards.

Lineage for Compliance

-- GDPR data flow report
SELECT
  source_table,
  target_table,
  columns_used,
  transformation_type,
  query_text
FROM TABLE(INFORMATION_SCHEMA.DATA_LINEAGE(
  DIRECTION => 'DOWNSTREAM'
))
WHERE source_table LIKE '%customer%'
  OR source_table LIKE '%pii%'
  OR source_table LIKE '%personal%'
ORDER BY source_table, target_table;

-- Data retention impact
SELECT
  target_table,
  COUNT(DISTINCT source_tables) as upstream_sources,
  MIN(retention_days) as min_retention,
  MAX(retention_days) as max_retention
FROM TABLE(INFORMATION_SCHEMA.DATA_LINEAGE(
  DIRECTION => 'UPSTREAM'
))
GROUP BY 1;
  • Data lineage tracks transformations from source to target
  • Column-level lineage provides granular visibility
  • Impact analysis assesses downstream effects of changes
  • Automatic capture for all DML operations
  • Supports compliance requirements (GDPR, HIPAA, SOX)

Advertisement

Need Expert Snowflake Help?

Get personalized warehouse optimization, data modeling, or Snowflake platform consulting.

Advertisement