Purview Deep Dive: Classification, Lineage & Power BI
Master Purview with advanced classification, lineage tracking, and Power BI integration
Purview Capabilities
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PURVIEW CAPABILITIES β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β AUTOMATED DISCOVERY & SCANNING β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β’ Connect to 100+ data sources β β
β β β’ Scheduled scanning (daily/weekly) β β
β β β’ Incremental scanning (changed data only) β β
β β β’ Custom scan rulesets β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β CLASSIFICATION β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β’ 100+ built-in classifiers (PII, financial) β β
β β β’ Custom classifiers (regex, keyword list) β β
β β β’ Auto-labeling with sensitivity labels β β
β β β’ Column-level classification β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β LINEAGE β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β’ End-to-end lineage across Azure services β β
β β β’ ADF pipeline lineage β β
β β β’ Databricks notebook lineage β β
β β β’ Synapse SQL lineage β β
β β β’ Power BI dataset lineage β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β DATA MAP & CATALOG β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β’ Unified metadata repository β β
β β β’ Business glossary β β
β β β’ Search and discovery β β
β β β’ Impact analysis β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Custom Classification
from azure.purview.datamap import PurviewDataMapClient
from azure.identity import DefaultAzureCredential
credential = DefaultAzureCredential()
client = PurviewDataMapClient(credential=credential, account_name="purview-prod")
# Create custom classifier
client.classification.create_classification_rule(
rule={
"name": "CustomerCodeClassifier",
"description": "Detects internal customer codes",
"ruleType": "Regex",
"pattern": r"CUST-\d{6}",
"columnNamePatterns": ["customer_id", "cust_code"],
"minPrecision": 0.8
}
)
# Apply custom classifier
client.classification.classify_asset(
asset_type="azure_datalake_gen2_path",
asset_guid="asset-guid",
classifiers=["CustomerCodeClassifier"]
)
Lineage Tracking
# Get lineage for a data asset
lineage = client.lineage.get_lineage(
entity_guid="asset-guid",
direction="Both"
)
# Visualize lineage graph
for edge in lineage.relations:
print(f"{edge.source_entity.name} --> {edge.target_entity.name}")
print(f" Type: {edge.relationship_type}")
Power BI Integration
{
"scanName": "powerbi-workspace-scan",
"dataSource": {
"type": "PowerBI",
"properties": {
"tenantId": "tenant-id",
"workspaceIds": ["workspace-id-1", "workspace-id-2"]
}
},
"scanRuleset": {
"type": "System",
"name": "PowerBI"
}
}
βΉοΈ
Pro Tip: Use Purview's lineage to trace data from source to Power BI report. This enables impact analysis when source schemas change and helps with data trust assessment.
Interview Questions
Q1: How do you implement end-to-end lineage in Purview? A: Enable integrations with ADF, Databricks, Synapse, and Power BI. Configure scanning schedules. Use Purview SDK to register custom lineage. Link glossary terms to assets for business context.
Q2: What are the best practices for Purview scanning? A: 1) Scan by domain/collection, 2) Use appropriate scan rulesets, 3) Schedule incremental scans, 4) Monitor scan status, 5) Review and approve classification results, 6) Use custom classifiers for domain-specific data.
Q3: How does Purview support data governance? A: Provides automated discovery, classification, lineage tracking, business glossary, access policies, and compliance reportingβenabling organizations to understand, manage, and protect their data assets.