πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Data Catalog: Purview, Metadata Scanning & Glossary

Azure Data EngineeringData Catalog⭐ Premium

Advertisement

Data Catalog: Purview, Metadata Scanning & Glossary

Enterprise data cataloging with Purview for metadata discovery, classification, and business glossary management

Data Catalog Architecture

Architecture Diagram
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    DATA CATALOG ARCHITECTURE                         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                     β”‚
β”‚  DATA SOURCES           SCANNING            CATALOG                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”‚
β”‚  β”‚ ADLS     │───────>β”‚ Purview      │────>β”‚ Data Map     β”‚        β”‚
β”‚  β”‚ Gen2     β”‚        β”‚ Scanner      β”‚     β”‚              β”‚        β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚ β€’ Assets     β”‚        β”‚
β”‚                                           β”‚ β€’ Lineage    β”‚        β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚ β€’ Classif.   β”‚        β”‚
β”‚  β”‚ Synapse  │───────>β”‚ Purview      │────>β”‚              β”‚        β”‚
β”‚  β”‚ SQL      β”‚        β”‚ Scanner      β”‚     β”‚ Collections  β”‚        β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚              β”‚        β”‚
β”‚                                           β”‚ β€’ Domain A   β”‚        β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚ β€’ Domain B   β”‚        β”‚
β”‚  β”‚ Power BI │───────>β”‚ Purview      │────>β”‚ β€’ Domain C   β”‚        β”‚
β”‚  β”‚          β”‚        β”‚ Integration  β”‚     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                               β”‚
β”‚                                                                     β”‚
β”‚  DISCOVERY & GOVERNANCE:                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚                                                               β”‚   β”‚
β”‚  β”‚  BUSINESS GLOSSARY      DATA CLASSIFICATION                  β”‚   β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                 β”‚   β”‚
β”‚  β”‚  β”‚ Sales Revenue   β”‚   β”‚ PII.Email       β”‚                 β”‚   β”‚
β”‚  β”‚  β”‚ Customer Segmentβ”‚   β”‚ PII.Phone       β”‚                 β”‚   β”‚
β”‚  β”‚  β”‚ Product Categoryβ”‚   β”‚ Financial.Card  β”‚                 β”‚   β”‚
β”‚  β”‚  β”‚ Order Status    β”‚   β”‚ Custom.Code     β”‚                 β”‚   β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                 β”‚   β”‚
β”‚  β”‚                                                               β”‚   β”‚
β”‚  β”‚  SEARCH & DISCOVERY     ACCESS POLICIES                      β”‚   β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                 β”‚   β”‚
β”‚  β”‚  β”‚ Full-text search β”‚   β”‚ RBAC per asset  β”‚                 β”‚   β”‚
β”‚  β”‚  β”‚ Tag-based        β”‚   β”‚ Sensitivity     β”‚                 β”‚   β”‚
β”‚  β”‚  β”‚ Column-level     β”‚   β”‚ labels          β”‚                 β”‚   β”‚
β”‚  β”‚  β”‚ Impact analysis  β”‚   β”‚ Compliance      β”‚                 β”‚   β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                 β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Purview Scanning Setup

from azure.purview.datamap import PurviewDataMapClient
from azure.identity import DefaultAzureCredential

credential = DefaultAzureCredential()
client = PurviewDataMapClient(
    credential=credential,
    account_name="purview-prod"
)

# Create collection for data domain
collection = client.collection.create_collection(
    collection={
        "name": "SalesData",
        "parentCollectionName": "Root",
        "description": "Sales domain data assets"
    }
)

# Create scan for ADLS Gen2
scan = client.scan.create_scan(
    scan_name="adls-sales-scan",
    collection_name="SalesData",
    properties={
        "dataSource": {
            "type": "AzureDataLakeStorageGen2",
            "properties": {
                "url": "https://stdatalake001.dfs.core.windows.net",
                "tenantId": "tenant-id"
            }
        },
        "scanRuleset": {
            "type": "System",
            "name": "AzureDataLakeStorageGen2"
        },
        "schedule": {
            "frequency": "Daily",
            "time": "02:00"
        }
    }
)

# Run scan
client.scan.run_scan(
    collection_name="SalesData",
    scan_name="adls-sales-scan"
)

Business Glossary Management

# Create glossary term
term = client.glossary.create_glossary_term(
    glossary_name="BusinessGlossary",
    glossary_term={
        "name": "Sales Revenue",
        "description": "Total revenue from product sales",
        "abbreviation": "Rev",
        "termStatus": "Approved",
        "steward": "data-team@company.com",
        "relatedTerms": ["Net Revenue", "Gross Revenue"],
        "synonyms": ["Sales Income", "Revenue"]
    }
)

# Link term to data asset
client.relationship.create_relationship(
    entity1_type="AtlasGlossaryTerm",
    entity1_guid="term-guid",
    entity2_type="azure_datalake_gen2_path",
    entity2_guid="asset-guid",
    relationshipType="AtlasGlossaryTermAtlasGlossaryTerm"
)

ℹ️

Pro Tip: Use Purview's automated classification to discover sensitive data. Create custom classifiers for domain-specific patterns (e.g., internal customer codes, product SKUs).

Interview Questions

Q1: How does Purview differ from traditional data catalogs? A: Purview is cloud-native, integrates with Azure services, provides automated scanning/classification, and supports hybrid environments. Traditional catalogs are often on-premises with manual metadata entry.

Q2: What is the benefit of linking glossary terms to data assets? A: Links business terminology to technical assets, enabling non-technical users to discover relevant data. Supports impact analysis when terms change and provides context for data governance.

Q3: How do you implement data catalog governance? A: 1) Define ownership per collection/domain, 2) Establish scanning schedules, 3) Configure classification rules, 4) Create business glossary, 5) Set up access policies, 6) Monitor catalog usage and quality.

Advertisement