Governance & Cataloging Interview Q&A
25 interview questions on data governance, Purview, lineage, and compliance
Question 1: What is data governance?
Answer: Managing data availability, usability, integrity, and security. Includes policies, procedures, standards, and roles for data management.
Question 2: What is Microsoft Purview?
Answer: Unified data governance service for discovering, classifying, and governing data across hybrid environments. Provides scanning, classification, lineage, and business glossary.
Question 3: How does Purview discover data assets?
Answer: Automated scanning of data sources, metadata extraction, classification, and cataloging. Supports 100+ data sources including Azure, on-premises, and SaaS.
Question 4: What is data lineage?
Answer: Tracking data flow from source to consumption. Shows transformations, dependencies, and impact. Critical for compliance, debugging, and trust.
Question 5: What is the benefit of business glossary?
Answer: Standardizes business terminology, links to technical assets, enables impact analysis, and bridges business/technical stakeholders.
Question 6: How do you implement data classification?
Answer: Purview built-in classifiers (PII, financial), custom classifiers (regex/keywords), auto-labeling with sensitivity labels, and column-level classification.
Question 7: What are sensitivity labels?
Answer: Tags that classify data by sensitivity (Public, Internal, Confidential). Trigger protection policies (encryption, access controls). Auto-labeling with Purview.
Question 8: How do you implement data catalog governance?
Answer: Define ownership per collection, establish scanning schedules, configure classification rules, create business glossary, set up access policies.
Question 9: What is the difference between data catalog and data dictionary?
Answer: Data Catalog: Metadata repository for discovery and governance. Data Dictionary: Documentation of data elements and definitions. Purview combines both.
Question 10: How does Purview integrate with Power BI?
Answer: Automatic scanning of Power BI workspaces, lineage tracking, classification, and cataloging of datasets and reports.
Question 11: What is the benefit of automated classification?
Answer: Consistent classification at scale, reduces manual effort, discovers unknown sensitive data, and supports compliance requirements.
Question 12: How do you handle data quality in governance?
Answer: Define quality rules, implement validation at multiple stages, monitor quality metrics, and remediate issues. Use Great Expectations or Purview data quality.
Question 13: What is the difference between RBAC and ACLs?
Answer: RBAC: Role-based access at resource level. ACLs: POSIX-compliant permissions at file/directory level. Use RBAC for administrative; ACLs for data lake workloads.
Question 14: How do you implement GDPR compliance?
Answer: Classify PII, implement DSAR, enable right to erasure, track consent, maintain audit logs, and implement data retention policies.
Question 15: What is the benefit of data stewardship?
Answer: Domain expertise for data quality, ownership accountability, policy enforcement, and business context for technical assets.
Question 16: How do you audit data access?
Answer: Enable diagnostic settings, send to Log Analytics, create KQL queries, implement alerts, maintain logs for required retention periods.
Question 17: What is the difference between metadata and data lineage?
Answer: Metadata: Describes data (schema, type, owner). Lineage: Tracks data flow (source β transformations β destination). Both are essential for governance.
Question 18: How do you handle cross-domain governance?
Answer: Federated governance model, global standards, cross-domain data contracts, and Purview collections per domain.
Question 19: What is the benefit of impact analysis?
Answer: Understand downstream effects of changes, assess risk, plan migrations, and maintain data trust.
Question 20: How do you implement data retention policies?
Answer: Lifecycle management for storage, database retention settings, Purview retention labels, and automated cleanup.
Question 21: What is the benefit of data marketplace?
Answer: Discover and consume data products across organizations, enable data sharing, and monetize data assets.
Question 22: How do you handle data governance for streaming?
Answer: Purview scanning for streaming sources, schema registry for event schemas, and lineage tracking for streaming pipelines.
Question 23: What is the difference between data governance and data management?
Answer: Governance: Policies, standards, accountability. Management: Execution, operations, technical implementation. Governance directs management.
Question 24: How do you measure governance success?
Answer: Data quality metrics, compliance scores, catalog adoption, lineage coverage, and stakeholder satisfaction.
Question 25: What is the future of data governance?
Answer: AI-powered governance, automated compliance, unified governance across hybrid/multi-cloud, and self-service data access.