Data Governance: Purview, Sensitivity Labels & Policies
Enterprise data governance with Purview sensitivity labels, access policies, and compliance management
Data Governance Framework
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DATA GOVERNANCE FRAMEWORK β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β GOVERNANCE PILLARS β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β 1. DATA STEWARDSHIP 2. DATA QUALITY β β
β β βββββββββββββββββββββββ βββββββββββββββββββββββ β β
β β β Ownership per domainβ β Validation rules β β β
β β β Accountability β β Quality metrics β β β
β β β Decision rights β β Remediation process β β β
β β βββββββββββββββββββββββ βββββββββββββββββββββββ β β
β β β β
β β 3. DATA SECURITY 4. DATA COMPLIANCE β β
β β βββββββββββββββββββββββ βββββββββββββββββββββββ β β
β β β Sensitivity labels β β GDPR, HIPAA, SOC2 β β β
β β β Access policies β β Retention policies β β β
β β β Encryption β β Audit trails β β β
β β βββββββββββββββββββββββ βββββββββββββββββββββββ β β
β β β β
β β 5. DATA LIFECYCLE 6. DATA CATALOG β β
β β βββββββββββββββββββββββ βββββββββββββββββββββββ β β
β β β Creation β Archive β β Discovery β β β
β β β Retention β Delete β β Metadata β β β
β β β Tiering policies β β Lineage β β β
β β βββββββββββββββββββββββ βββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β PURVIEW GOVERNANCE CAPABILITIES: β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β’ Automated data discovery and classification β β
β β β’ Sensitivity labeling (auto + manual) β β
β β β’ Access policies (RBAC + ACLs) β β
β β β’ Data lineage tracking β β
β β β’ Business glossary β β
β β β’ Compliance reporting β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Sensitivity Labels Configuration
from azure.purview.datamap import PurviewDataMapClient
from azure.identity import DefaultAzureCredential
credential = DefaultAzureCredential()
client = PurviewDataMapClient(credential=credential, account_name="purview-prod")
# Apply sensitivity label to asset
client.classification.add_classification(
entity_type="azure_datalake_gen2_path",
entity_guid="asset-guid",
classifications=[
{"typeName": "Microsoft.SensitivityLabel.Confidential"},
{"typeName": "PII.Email"},
{"typeName": "PII.PhoneNumber"}
]
)
# Get classification details
classifications = client.classification.get_classification(
entity_type="azure_datalake_gen2_path",
entity_guid="asset-guid"
)
Access Policy Implementation
{
"policies": [
{
"name": "DataAnalyst-ReadOnly",
"description": "Read-only access for data analysts",
"principal": "data-analysts@company.com",
"scope": "/subscriptions/xxx/resourceGroups/rg/providers/Microsoft.Storage/storageAccounts/stdatalake001",
"role": "Storage Blob Data Reader",
"conditions": {
"sensitivityLabels": ["Public", "Internal"],
"timeRestriction": "BusinessHours"
}
},
{
"name": "DataEngineer-FullAccess",
"description": "Full access for data engineers",
"principal": "data-engineers@company.com",
"scope": "/subscriptions/xxx/resourceGroups/rg/providers/Microsoft.Storage/storageAccounts/stdatalake001",
"role": "Storage Blob Data Contributor",
"conditions": {
"sensitivityLabels": ["Public", "Internal", "Confidential"],
"requireMFA": true
}
}
]
}
βΉοΈ
Pro Tip: Implement sensitivity labels at the column level for granular protection. Use auto-labeling rules in Purview to automatically classify PII and financial data.
Interview Questions
Q1: How do you implement data governance in a data lake? A: 1) Define data ownership per domain, 2) Implement Purview scanning and classification, 3) Apply sensitivity labels, 4) Set up RBAC and ACLs, 5) Create business glossary, 6) Monitor compliance with audit logs.
Q2: What is the difference between RBAC and ACLs in Azure data governance? A: RBAC provides role-based access at resource level (Storage Account, Container). ACLs provide POSIX-compliant permissions at file/directory level. Use RBAC for administrative access; ACLs for data lake workloads.
Q3: How do you handle data retention policies in Azure? A: Use lifecycle management policies in ADLS Gen2 to automatically tier data (Hot β Cool β Cold β Archive) and delete expired data. Configure retention periods based on compliance requirements (7 years for financial data).