Compliance Frameworks
GCP Compliance Features
# Compliance implementation checklist
compliance_checklist = {
"encryption": {
"at_rest": "CMEK for sensitive data",
"in_transit": "TLS 1.2+ (automatic)",
"key_rotation": "Automated via Cloud KMS"
},
"access_control": {
"iam": "Least privilege principle",
"vpc_sc": "VPC Service Controls for perimeter security",
"column_security": "Policy tags in BigQuery"
},
"audit_logging": {
"admin_activity": "Always enabled",
"data_access": "Enable for sensitive data",
"log_retention": "Store in separate project"
},
"data_residency": {
"single_region": "Data stays in specified region",
"dual_region": "Two specific regions",
"multi_region": "All regions in geographic area"
}
}
GDPR Implementation
# GDPR data deletion request handler
from google.cloud import bigquery
def handle_deletion_request(project_id, dataset_id, user_email):
"""Handle GDPR right to erasure request."""
client = bigquery.Client(project=project_id)
# Find and delete user data across all tables
tables = client.list_tables(f"{project_id}.{dataset_id}")
for table in tables:
query = f"""
DELETE FROM `{project_id}.{dataset_id}.{table.table_id}`
WHERE email = '{user_email}'
"""
try:
job = client.query(query)
job.result()
print(f"Deleted data from {table.table_id}")
except Exception as e:
print(f"Error deleting from {table.table_id}: {e}")
# Log deletion for audit
log_deletion_request(user_email, dataset_id)
β¨
Best Practice: Implement defense-in-depth: encryption (CMEK), access control (IAM + VPC-SC), audit logging, and data classification. Use separate projects for production and non-production. Enable data access audit logs for sensitive datasets. Review compliance configurations quarterly.
Common Interview Questions
Q1: What are the key GDPR requirements for data engineering?
Answer: 1) Data minimization (collect only necessary data), 2) Purpose limitation (use data only for stated purposes), 3) Storage limitation (delete when no longer needed), 4) Right to erasure (implement deletion requests), 5) Data portability (export user data).
Q2: How does GCP help with HIPAA compliance?
Answer: Google signs BAAs for HIPAA-covered services (BigQuery, GCS, etc.). GCP provides: 1) Encryption at rest and in transit, 2) Audit logging, 3) Access controls, 4) Data processing agreements. Customers must implement additional controls for PHI.
Q3: What is the purpose of VPC Service Controls for compliance?
Answer: VPC Service Controls create security perimeters to prevent data exfiltration. They restrict which services can access sensitive data, control network egress, and provide audit trails for compliance reporting.
Q4: How do you implement data residency requirements?
Answer: 1) Use single-region resources for strict residency, 2) Configure BigQuery dataset locations, 3) Use GCS dual-region buckets, 4) Implement IAM conditions for region restrictions, 5) Monitor and audit data locations.
Q5: What is the difference between encryption at rest and in transit?
Answer: At-rest encryption protects data stored on disk (CMEK or Google-managed keys). In-transit encryption protects data moving between services (TLS 1.2+). Both are required for compliance. GCP provides automatic in-transit encryption; at-rest requires CMEK for customer-managed keys.