IAM Architecture Overview
Architecture Diagram
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AWS IAM ARCHITECTURE β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β AWS ACCOUNT ROOT β β
β β (email@company.com) - Full access, use only for account setup β β
β βββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββ β
β β β
β βββββββββββββββββββΌββββββββββββββββββ β
β βΌ βΌ βΌ β
β βββββββββββββββββββββ βββββββββββββββββββββ βββββββββββββββββββββ β
β β IAM USERS β β IAM GROUPS β β IAM ROLES β β
β β β β β β β β
β β β’ data-engineer β β β’ DataTeam β β β’ GlueServiceRole β β
β β β’ analytics-admin β β β’ Admins β β β’ LambdaExecRole β β
β β β’ dev-readonly β β β’ ReadOnly β β β’ RedshiftRole β β
β β β’ ci-cd-pipeline β β β’ SecurityTeam β β β’ EMR_EC2_Role β β
β βββββββββββ¬ββββββββββ βββββββββββ¬ββββββββββ βββββββββββ¬ββββββββββ β
β β β β β
β βΌ βΌ βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β IAM POLICIES β β
β β β β
β β βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββ β β
β β β Managed β β Inline β β Service β β SCP β β β
β β β Policies β β Policies β β Policies β β (Orgs) β β β
β β β β β β β β β β β β
β β β AWS Managed β β Per-object β β Built-in β β Guardrailsβ β β
β β β Custom β β policies β β service β β Org-wide β β β
β β β β β β β roles β β β β β
β β βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β AUTHENTICATION & AUTHORIZATION β β
β β β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β β
β β β MFA β β STS β β Identity β β β
β β β (Virtual/ β β (Temporary β β Federation β β β
β β β Hardware) β β Creds) β β (SAML/OIDC) β β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
IAM Policy Document Structure
Basic Policy Document
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowS3BucketAccess",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::data-lake-raw",
"arn:aws:s3:::data-lake-raw/*"
],
"Condition": {
"StringEquals": {
"aws:RequestedRegion": "us-east-1"
}
}
}
]
}
Complex Policy with Multiple Conditions
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowGlueAccess",
"Effect": "Allow",
"Action": [
"glue:CreateDatabase",
"glue:CreateTable",
"glue:GetDatabase",
"glue:GetTable",
"glue:GetPartitions",
"glue:BatchCreatePartition"
],
"Resource": [
"arn:aws:glue:us-east-1:123456789012:database/*",
"arn:aws:glue:us-east-1:123456789012:table/*",
"arn:aws:glue:us-east-1:123456789012:crawler/*"
],
"Condition": {
"StringEquals": {
"aws:ResourceTag/Environment": "production",
"aws:RequestedRegion": "us-east-1"
}
}
}
]
}
βΉοΈ
Pro Tip: Always use conditions to restrict access to specific regions, tags, or VPC endpoints. This prevents accidental cross-region data access and reduces attack surface.
IAM Roles for Data Engineering Services
Glue Service Role
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::data-lake-*",
"arn:aws:s3:::data-lake-*/*"
]
},
{
"Effect": "Allow",
"Action": [
"glue:CreateDatabase",
"glue:CreateTable",
"glue:GetDatabase",
"glue:GetTable",
"glue:GetPartitions",
"glue:BatchCreatePartition",
"glue:BatchDeletePartition"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "arn:aws:logs:*:*:/aws-glue/*"
},
{
"Effect": "Allow",
"Action": [
"ec2:CreateNetworkInterface",
"ec2:DescribeNetworkInterfaces",
"ec2:DeleteNetworkInterface"
],
"Resource": "*"
}
]
}
Lambda Execution Role for Data Processing
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::data-pipeline-*/*"
},
{
"Effect": "Allow",
"Action": [
"dynamodb:GetItem",
"dynamodb:PutItem",
"dynamodb:UpdateItem",
"dynamodb:Query"
],
"Resource": "arn:aws:dynamodb:us-east-1:123456789012:table/pipeline-state"
},
{
"Effect": "Allow",
"Action": [
"kinesis:GetRecords",
"kinesis:GetShardIterator",
"kinesis:DescribeStream",
"kinesis:ListStreams"
],
"Resource": "arn:aws:kinesis:us-east-1:123456789012:stream/data-stream"
},
{
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "arn:aws:logs:*:*:*"
}
]
}
Redshift Cluster Role
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::data-warehouse-*",
"arn:aws:s3:::data-warehouse-*/*"
]
},
{
"Effect": "Allow",
"Action": [
"glue:GetDatabase",
"glue:GetTable",
"glue:GetPartitions"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"sns:Publish"
],
"Resource": "arn:aws:sns:us-east-1:123456789012:redshift-alerts"
}
]
}
IAM for Cross-Account Data Sharing
AssumeRole Policy (Account A β Account B)
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "sts:AssumeRole",
"Resource": "arn:aws:iam::ACCOUNT_B:role/CrossAccountDataAccess",
"Condition": {
"StringEquals": {
"sts:ExternalId": "unique-external-id-12345"
}
}
}
]
}
Cross-Account S3 Bucket Policy
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowCrossAccountAccess",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::ACCOUNT_B:role/DataAnalyticsRole"
},
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::shared-data-lake",
"arn:aws:s3:::shared-data-lake/*"
],
"Condition": {
"StringEquals": {
"s3:ExistingObjectTag/Classification": "internal"
}
}
}
]
}
IAM for Lake Formation
Architecture Diagram
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LAKE FORMATION PERMISSION MODEL β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Lake Formation Permissions β β
β β β β
β β Database Level: CREATE, ALTER, DROP β β
β β Table Level: CREATE, ALTER, DROP, SELECT, INSERT, DELETE β β
β β Column Level: SELECT (specific columns) β β
β β Row Level: SELECT (with row-level filters) β β
β β Cell Level: SELECT (specific cells) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ β
β β IAM Policies β β Lake Formation β β Data Catalog β β
β β (AWS Resources)β β (Fine-grained) β β (Metadata) β β
β β β β β β β β
β β s3:*, glue:* β β Database/Table β β Tables/Columns β β
β β ec2:* β β Row/Column β β Partitions β β
β β β β Cell-level β β β β
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ β
β β
β PERMISSION CHECK FLOW: β
β 1. User/Role β IAM Policy Check β Deny/Allow β
β 2. Allow β Lake Formation Permission Check β Deny/Allow β
β 3. Allow β Data Catalog Permission Check β Deny/Allow β
β 4. Allow β S3 Object Permission Check β Deny/Allow β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β οΈ
Security Warning: Lake Formation permissions override IAM policies. Always check both layers when debugging access issues. A user might have IAM permission but not Lake Formation permission.
IAM Best Practices for Data Engineering
Principle of Least Privilege
Architecture Diagram
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LEAST PRIVILEGE IMPLEMENTATION β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β ROLE β IAM POLICIES β LF PERMISSIONS β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ β
β β Data Engineer β S3, Glue, Lambda β All tables β β
β β Data Analyst β Athena, QuickSight β SELECT on gold/ β β
β β Pipeline Operator β Step Functions, Glue β bronze/silver β β
β β Security Auditor β CloudTrail, Config β Read-only catalog β β
β β DevOps Engineer β EC2, VPC, CloudWatch β No data access β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β IMPLEMENTATION: β
β 1. Start with deny-all policy β
β 2. Add permissions incrementally β
β 3. Use condition keys to limit scope β
β 4. Review and audit quarterly β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
IAM Policy Evaluation Logic
Architecture Diagram
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β IAM POLICY EVALUATION FLOW β
β β
β Request βββΊ Default Deny βββΊ Explicit Deny? βββΊ YES βββΊ DENIED β
β β β β
β NO β β
β β β β
β βΌ β β
β Service Control Policy (SCP) βββΊ Deny? βββΊ YES βββΊ DENIED β
β β β β
β NO β β
β β β β
β βΌ β β
β Permission Boundary βββΊ Deny? βββΊ YES βββΊ DENIED β
β β β β
β NO β β
β β β β
β βΌ β β
β Resource-based Policy βββΊ Deny? βββΊ YES βββΊ DENIED β
β β β β
β NO β β
β β β β
β βΌ β β
β Identity-based Policy βββΊ Allow? βββΊ NO βββΊ DENIED β
β β β β
β YES β β
β β β β
β βΌ β β
β βββββββββββ β β
β β ALLOWED β β β
β βββββββββββ β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
IAM for Service Accounts (IRSA on EKS)
Creating IRSA for EKS Workloads
# Create IAM role for service account
eksctl create iamserviceaccount \
--name glue-service-account \
--namespace data-engineering \
--cluster my-cluster \
--role-name GlueEKSRole \
--attach-policy-arn arn:aws:iam::123456789012:policy/GlueAccessPolicy \
--approve
# Verify the role
aws iam get-role --role-name GlueEKSRole
Trust Policy for IRSA
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::123456789012:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.us-east-1.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE:sub": "system:serviceaccount:data-engineering:glue-service-account",
"oidc.eks.us-east-1.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE:aud": "sts.amazonaws.com"
}
}
}
]
}
IAM Monitoring and Auditing
CloudTrail for IAM Events
{
"Source": ["iam.amazonaws.com"],
"DetailType": ["AWS API Call via CloudTrail"],
"Detail": {
"eventSource": ["iam.amazonaws.com"],
"eventName": [
"CreateUser",
"DeleteUser",
"CreateAccessKey",
"DeleteAccessKey",
"AttachUserPolicy",
"DetachUserPolicy",
"PutRolePolicy",
"DeleteRolePolicy"
]
}
}
IAM Access Analyzer Findings
| Finding Type | Severity | Action Required |
|---|---|---|
| Public Access | Critical | Restrict immediately |
| Cross-Account | Medium | Review and validate |
| Unused Permissions | Low | Remove if not needed |
| Overly Permissive | High | Apply least privilege |
| MFA Not Enabled | High | Enable MFA |
Cost Considerations
βΉοΈ
IAM is Free: AWS IAM is provided at no additional charge. You only pay for the AWS services that IAM controls access to. However, consider these cost implications:
- Overly permissive roles can lead to accidental resource usage
- Cross-account access may incur data transfer costs
- STS calls are free but have rate limits
Summary
IAM is the foundation of AWS security for data engineering. Key takeaways:
- Roles over Users: Always prefer IAM roles for services and applications
- Least Privilege: Grant minimum necessary permissions
- Cross-Account: Use STS AssumeRole with ExternalId
- Lake Formation: Additional layer for fine-grained data access
- MFA: Enable for all human users
- Monitoring: Use CloudTrail and Access Analyzer
- Tagging: Use resource tags for policy conditions