π‘οΈ Data Governance on AWS
Master Lake Formation, Macie, Config rules, and governance frameworks.
Module: AWS Data Engineering β’ Topic 26 of 65 β’ Premium Content
Governance Framework
Architecture Diagram
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DATA GOVERNANCE FRAMEWORK β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β 1. ACCESS CONTROL β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β β
β β β Lake β β IAM Policies β β SCPs β β β
β β β Formation β β β β (Org-wide) β β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β 2. DATA CLASSIFICATION β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β β
β β β Macie β β Custom Tags β β Glue β β β
β β β (PII Detect) β β β β Classifiers β β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β 3. COMPLIANCE MONITORING β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β β
β β β AWS Config β β CloudTrail β β Security β β β
β β β Rules β β β β Hub β β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β 4. AUDIT & LINEAGE β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β β
β β β CloudTrail β β Glue Lineage β β CloudWatch β β β
β β β (API Audit) β β β β Logs β β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Amazon Macie
import boto3
macie = boto3.client('macie2')
# Enable Macie
macie.enable_macie(
findingPublishingFrequency='FIFTEEN_MINUTES',
status='ENABLED'
)
# Create classification job
response = macie.create_classification_job(
jobType='ONE_TIME',
name='s3-data-classification',
s3JobDefinition={
'bucketDefinitions': [
{
'accountId': '123456789012',
'buckets': ['data-lake-raw', 'data-lake-processed']
}
],
'scopingRule={
'excludes': {
'and': [{
'simpleCriterion': {
'comparator': 'EQ',
'key': 'OBJECT_EXTENSION',
'values': ['.log', '.tmp']
}
}]
}
}'
},
schedulingFrequency='DAILY'
)
# Get findings
findings = macie.list_findings(
findingCriteria={
'criterion': {
'classificationDetails.result.status': {
'eq': ['COMPLETE']
}
}
}
)
AWS Config Rules
import boto3
config = boto3.client('config')
# Check S3 bucket encryption
config.put_config_rule(
ConfigRule={
'ConfigRuleName': 's3-bucket-server-side-encryption-enabled',
'Source': {
'Owner': 'AWS',
'SourceIdentifier': 'S3_BUCKET_SERVER_SIDE_ENCRYPTION_ENABLED'
},
'Scope': {
'ComplianceResourceTypes': ['AWS::S3::Bucket']
}
}
)
# Custom rule: Check data classification tags
config.put_config_rule(
ConfigRule={
'ConfigRuleName': 'data-classification-tag-exists',
'Source': {
'Owner': 'CUSTOM_LAMBDA',
'SourceIdentifier': 'arn:aws:lambda:us-east-1:123456789012:function:check-classification-tag',
'SourceDetails': [
{
'EventSource': 'aws.config',
'MessageType': 'ConfigurationItemChangeNotification'
}
]
},
'Scope': {
'ComplianceResourceTypes': ['AWS::S3::Bucket']
}
}
)
Interview Q&A
Q1: What is data governance?
Answer: Data governance is the framework of policies, processes, and standards that ensure data is managed securely, compliantly, and effectively across an organization.
Q2: How does Macie help with governance?
Answer: Macie automatically discovers and classifies sensitive data (PII, financial) in S3, providing visibility into what sensitive data you have and where it's stored.
Q3: What are SCPs in the governance context?
Answer: Service Control Policies are organization-wide guardrails that restrict which AWS services and actions accounts can use, enforcing governance at scale.
Summary
- Access Control: Lake Formation, IAM policies, SCPs
- Classification: Macie for sensitive data discovery
- Compliance: AWS Config rules for resource compliance
- Audit: CloudTrail for API audit trails
- Lineage: Glue Lineage for data flow tracking