Snowflake Data Lineage Tracking
Data Lineage in Snowflake tracks the flow of data from source to destination, providing visibility into transformations, dependencies, and impact for governance and compliance.
Architecture Overview
<svg width="800" height="450" viewBox="0 0 800 450" xmlns="http://www.w3.org/2000/svg">
<defs>
<linearGradient id="lineGrad" x1="0%" y1="0%" x2="100%" y2="0%">
<stop offset="0%" style="stop-color:#3498DB;stop-opacity:1" />
<stop offset="100%" style="stop-color:#5DADE2;stop-opacity:1" />
</linearGradient>
<linearGradient id="metaGrad" x1="0%" y1="0%" x2="100%" y2="0%">
<stop offset="0%" style="stop-color:#9B59B6;stop-opacity:1" />
<stop offset="100%" style="stop-color:#AF7AC5;stop-opacity:1" />
</linearGradient>
</defs>
<text x="400" y="30" text-anchor="middle" font-size="18" font-weight="bold" fill="#333">Snowflake Data Lineage Architecture</text>
<rect x="30" y="60" width="120" height="150" rx="10" fill="#6C5CE7" opacity="0.9"/>
<text x="90" y="85" text-anchor="middle" font-size="12" fill="white" font-weight="bold">Sources</text>
<text x="90" y="110" text-anchor="middle" font-size="10" fill="white">Raw Tables</text>
<text x="90" y="125" text-anchor="middle" font-size="10" fill="white">External Data</text>
<text x="90" y="140" text-anchor="middle" font-size="10" fill="white">APIs</text>
<text x="90" y="155" text-anchor="middle" font-size="10" fill="white">Files</text>
<text x="90" y="175" text-anchor="middle" font-size="10" fill="white">Streams</text>
<path d="M150 135 L190 135" stroke="#333" stroke-width="2" fill="none" marker-end="url(#arrowLine)"/>
<rect x="190" y="60" width="150" height="150" rx="10" fill="url(#lineGrad)" opacity="0.9"/>
<text x="265" y="85" text-anchor="middle" font-size="12" fill="white" font-weight="bold">Transform</text>
<text x="265" y="110" text-anchor="middle" font-size="10" fill="white">ETL Jobs</text>
<text x="265" y="125" text-anchor="middle" font-size="10" fill="white">Stored Procs</text>
<text x="265" y="140" text-anchor="middle" font-size="10" fill="white">Tasks</text>
<text x="265" y="155" text-anchor="middle" font-size="10" fill="white">Views</text>
<text x="265" y="175" text-anchor="middle" font-size="10" fill="white">UDFs</text>
<path d="M340 135 L380 135" stroke="#333" stroke-width="2" fill="none" marker-end="url(#arrowLine)"/>
<rect x="380" y="60" width="130" height="150" rx="10" fill="#F39C12" opacity="0.9"/>
<text x="445" y="85" text-anchor="middle" font-size="12" fill="white" font-weight="bold">Staging</text>
<text x="445" y="110" text-anchor="middle" font-size="10" fill="white">Curated</text>
<text x="445" y="125" text-anchor="middle" font-size="10" fill="white">Conformed</text>
<text x="445" y="140" text-anchor="middle" font-size="10" fill="white">Enriched</text>
<text x="445" y="155" text-anchor="middle" font-size="10" fill="white">Aggregated</text>
<text x="445" y="175" text-anchor="middle" font-size="10" fill="white">Validated</text>
<path d="M510 135 L550 135" stroke="#333" stroke-width="2" fill="none" marker-end="url(#arrowLine)"/>
<rect x="550" y="60" width="120" height="150" rx="10" fill="#2ECC71" opacity="0.9"/>
<text x="610" y="85" text-anchor="middle" font-size="12" fill="white" font-weight="bold">Serving</text>
<text x="610" y="110" text-anchor="middle" font-size="10" fill="white">Data Marts</text>
<text x="610" y="125" text-anchor="middle" font-size="10" fill="white">Analytics</text>
<text x="610" y="140" text-anchor="middle" font-size="10" fill="white">ML Features</text>
<text x="610" y="155" text-anchor="middle" font-size="10" fill="white">Reports</text>
<text x="610" y="175" text-anchor="middle" font-size="10" fill="white">APIs</text>
<path d="M670 135 L710 135" stroke="#333" stroke-width="2" fill="none" marker-end="url(#arrowLine)"/>
<rect x="710" y="60" width="70" height="150" rx="10" fill="#E74C3C" opacity="0.9"/>
<text x="745" y="85" text-anchor="middle" font-size="12" fill="white" font-weight="bold">Target</text>
<text x="745" y="110" text-anchor="middle" font-size="10" fill="white">Users</text>
<text x="745" y="125" text-anchor="middle" font-size="10" fill="white">Apps</text>
<text x="745" y="140" text-anchor="middle" font-size="10" fill="white">ML</text>
<text x="745" y="155" text-anchor="middle" font-size="10" fill="white">Exports</text>
<rect x="30" y="230" width="740" height="100" rx="10" fill="url(#metaGrad)" opacity="0.9"/>
<text x="400" y="255" text-anchor="middle" font-size="14" fill="white" font-weight="bold">Lineage Metadata</text>
<rect x="50" y="270" width="140" height="45" rx="8" fill="white" opacity="0.9"/>
<text x="120" y="297" text-anchor="middle" font-size="10" fill="#333">Object Dependencies</text>
<rect x="210" y="270" width="140" height="45" rx="8" fill="white" opacity="0.9"/>
<text x="280" y="297" text-anchor="middle" font-size="10" fill="#333">Column Lineage</text>
<rect x="370" y="270" width="140" height="45" rx="8" fill="white" opacity="0.9"/>
<text x="440" y="297" text-anchor="middle" font-size="10" fill="#333">Transformation Type</text>
<rect x="530" y="270" width="140" height="45" rx="8" fill="white" opacity="0.9"/>
<text x="600" y="297" text-anchor="middle" font-size="10" fill="#333">Query Text</text>
<rect x="690" y="270" width="70" height="45" rx="8" fill="white" opacity="0.9"/>
<text x="725" y="297" text-anchor="middle" font-size="10" fill="#333">Timestamp</text>
<rect x="30" y="350" width="180" height="80" rx="10" fill="#27AE60" opacity="0.85"/>
<text x="120" y="375" text-anchor="middle" font-size="12" fill="white" font-weight="bold">Impact Analysis</text>
<text x="120" y="395" text-anchor="middle" font-size="10" fill="white">Find downstream effects</text>
<text x="120" y="410" text-anchor="middle" font-size="10" fill="white">Assess change risk</text>
<rect x="230" y="350" width="180" height="80" rx="10" fill="#3498DB" opacity="0.85"/>
<text x="320" y="375" text-anchor="middle" font-size="12" fill="white" font-weight="bold">Root Cause</text>
<text x="320" y="395" text-anchor="middle" font-size="10" fill="white">Trace upstream sources</text>
<text x="320" y="410" text-anchor="middle" font-size="10" fill="white">Identify data issues</text>
<rect x="430" y="350" width="180" height="80" rx="10" fill="#F39C12" opacity="0.85"/>
<text x="520" y="375" text-anchor="middle" font-size="12" fill="white" font-weight="bold">Compliance</text>
<text x="520" y="395" text-anchor="middle" font-size="10" fill="white">Regulatory audits</text>
<text x="520" y="410" text-anchor="middle" font-size="10" fill="white">Data privacy reports</text>
<rect x="630" y="350" width="140" height="80" rx="10" fill="#9B59B6" opacity="0.85"/>
<text x="700" y="375" text-anchor="middle" font-size="12" fill="white" font-weight="bold">Discovery</text>
<text x="700" y="395" text-anchor="middle" font-size="10" fill="white">Data catalog</text>
<text x="700" y="410" text-anchor="middle" font-size="10" fill="white">Asset search</text>
<defs>
<marker id="arrowLine" markerWidth="10" markerHeight="10" refX="9" refY="3" orient="auto" markerUnits="strokeWidth">
<path d="M0,0 L0,6 L9,3 z" fill="#333"/>
</marker>
</defs>
</svg>
Key Concepts
DfData Lineage
DfImpact Analysis
Accessing Lineage
Object Lineage
-- Get lineage for a specific table
SELECT *
FROM TABLE(INFORMATION_SCHEMA.DATA_LINEAGE(
DATABASE_NAME => 'my_database',
SCHEMA_NAME => 'analytics',
TABLE_NAME => 'fact_sales'
));
-- Upstream dependencies
SELECT *
FROM TABLE(INFORMATION_SCHEMA.DATA_LINEAGE(
DATABASE_NAME => 'my_database',
SCHEMA_NAME => 'analytics',
TABLE_NAME => 'fact_sales',
DIRECTION => 'UPSTREAM'
));
-- Downstream dependencies
SELECT *
FROM TABLE(INFORMATION_SCHEMA.DATA_LINEAGE(
DATABASE_NAME => 'my_database',
SCHEMA_NAME => 'analytics',
TABLE_NAME => 'fact_sales',
DIRECTION => 'DOWNSTREAM'
));
Column-Level Lineage
-- Column lineage
SELECT
source_table,
source_column,
target_table,
target_column,
transformation_type,
transformation_expression
FROM TABLE(INFORMATION_SCHEMA.DATA_LINEAGE(
DATABASE_NAME => 'my_database',
SCHEMA_NAME => 'analytics',
TABLE_NAME => 'fact_sales',
COLUMN_NAME => 'revenue'
));
Impact Analysis Queries
Change Impact Assessment
-- Find all objects affected by a table change
WITH RECURSIVE downstream AS (
-- Base table
SELECT
table_name,
0 as depth,
table_name as root_table
FROM information_schema.tables
WHERE table_name = 'source_table'
UNION ALL
-- Direct dependents
SELECT
d.target_table,
d.depth + 1,
d.root_table
FROM downstream d
JOIN TABLE(INFORMATION_SCHEMA.DATA_LINEAGE(
DIRECTION => 'DOWNSTREAM'
)) l ON d.table_name = l.source_table
)
SELECT * FROM downstream
ORDER BY depth;
Data Quality Impact
-- Find objects with data quality issues
SELECT
source_table,
COUNT(DISTINCT target_table) as affected_objects,
COUNT(DISTINCT target_column) as affected_columns
FROM TABLE(INFORMATION_SCHEMA.DATA_LINEAGE(
DIRECTION => 'DOWNSTREAM'
))
WHERE source_table IN (
SELECT table_name
FROM information_schema.data_quality_metrics
WHERE quality_score < 0.9
)
GROUP BY 1
ORDER BY affected_objects DESC;
Lineage Visualization
-- Generate lineage graph data
SELECT
'node' as type,
table_name as id,
table_name as label,
CASE
WHEN table_type = 'BASE TABLE' THEN 'table'
WHEN table_type = 'VIEW' THEN 'view'
ELSE 'other'
END as shape
FROM information_schema.tables
WHERE database_name = 'my_database'
UNION ALL
SELECT
'edge' as type,
source_table as id,
target_table as label,
transformation_type as shape
FROM TABLE(INFORMATION_SCHEMA.DATA_LINEAGE(
DATABASE_NAME => 'my_database'
));
Data lineage is automatically captured by Snowflake for all DML operations. For complex pipelines, consider creating dedicated lineage views that aggregate metadata for visualization tools and impact analysis dashboards.
Lineage for Compliance
-- GDPR data flow report
SELECT
source_table,
target_table,
columns_used,
transformation_type,
query_text
FROM TABLE(INFORMATION_SCHEMA.DATA_LINEAGE(
DIRECTION => 'DOWNSTREAM'
))
WHERE source_table LIKE '%customer%'
OR source_table LIKE '%pii%'
OR source_table LIKE '%personal%'
ORDER BY source_table, target_table;
-- Data retention impact
SELECT
target_table,
COUNT(DISTINCT source_tables) as upstream_sources,
MIN(retention_days) as min_retention,
MAX(retention_days) as max_retention
FROM TABLE(INFORMATION_SCHEMA.DATA_LINEAGE(
DIRECTION => 'UPSTREAM'
))
GROUP BY 1;
- Data lineage tracks transformations from source to target
- Column-level lineage provides granular visibility
- Impact analysis assesses downstream effects of changes
- Automatic capture for all DML operations
- Supports compliance requirements (GDPR, HIPAA, SOX)