Infrastructure as Code Patterns
Difficulty: Senior Level | Companies: AWS, Google, Microsoft, Netflix, Uber
IaC Fundamentals
Infrastructure as Code (IaC) manages infrastructure through declarative configuration files. It enables version control, peer review, and automated provisioning of cloud resources.
ℹ️
IaC eliminates configuration drift and enables reproducible environments. Treat infrastructure code with the same rigor as application code.
IaC Tool Comparison
| Tool | State Management | Multi-Cloud | Learning Curve |
|---|---|---|---|
| Terraform | Remote state | Yes | Medium |
| CloudFormation | AWS-managed | No | Low |
| Pulumi | Remote state | Yes | Low (programming languages) |
| CDK | CloudFormation | No | Medium |
Pattern 1: Terraform Module Structure
Organize Terraform into reusable modules.
// modules/vpc/main.tf
variable "environment" {
type = string
description = "Environment name (dev, staging, prod)"
}
variable "vpc_cidr" {
type = string
description = "CIDR block for VPC"
default = "10.0.0.0/16"
}
variable "availability_zones" {
type = list(string)
description = "List of availability zones"
}
resource "aws_vpc" "main" {
cidr_block = var.vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "${var.environment}-vpc"
Environment = var.environment
ManagedBy = "terraform"
}
}
resource "aws_subnet" "public" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index)
availability_zone = var.availability_zones[count.index]
map_public_ip_on_launch = true
tags = {
Name = "${var.environment}-public-${var.availability_zones[count.index]}"
Tier = "public"
}
}
resource "aws_subnet" "private" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index + 10)
availability_zone = var.availability_zones[count.index]
tags = {
Name = "${var.environment}-private-${var.availability_zones[count.index]}"
Tier = "private"
}
}
output "vpc_id" {
value = aws_vpc.main.id
}
output "public_subnet_ids" {
value = aws_subnet.public[*].id
}
output "private_subnet_ids" {
value = aws_subnet.private[*].id
}
// environments/prod/main.tf
module "vpc" {
source = "../../modules/vpc"
environment = "prod"
vpc_cidr = "10.0.0.0/16"
availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
}
module "database" {
source = "../../modules/rds"
environment = "prod"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnet_ids
instance_class = "db.r6g.xlarge"
multi_az = true
}
⚠️
Never store secrets in Terraform files. Use AWS Secrets Manager, Vault, or environment variables for sensitive data.
Pattern 2: CloudFormation Stack Sets
Deploy across multiple accounts and regions.
# cloudformation-stack-set.yaml
AWSTemplateFormatVersion: '2010-09-09'
Description: 'VPC Stack Set for multi-account deployment'
Parameters:
Environment:
Type: String
AllowedValues: [dev, staging, prod]
VpcCidr:
Type: String
Default: '10.0.0.0/16'
Resources:
VPC:
Type: AWS::EC2::VPC
Properties:
CidrBlock: !Ref VpcCidr
EnableDnsHostnames: true
EnableDnsSupport: true
Tags:
- Key: Name
Value: !Sub '${Environment}-vpc'
- Key: Environment
Value: !Ref Environment
InternetGateway:
Type: AWS::EC2::InternetGateway
Properties:
Tags:
- Key: Name
Value: !Sub '${Environment}-igw'
VPCGatewayAttachment:
Type: AWS::EC2::VPCGatewayAttachment
Properties:
VpcId: !Ref VPC
InternetGatewayId: !Ref InternetGateway
PublicSubnet1:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref VPC
AvailabilityZone: !Select [0, !GetAZs '']
CidrBlock: !Select [0, !Cidr [!Ref VpcCidr, 4, 8]]
MapPublicIpOnLaunch: true
Tags:
- Key: Name
Value: !Sub '${Environment}-public-1'
Outputs:
VPCId:
Value: !Ref VPC
Export:
Name: !Sub '${Environment}-VPCId'
Pattern 3: Terraform Workspaces for Environment Isolation
Use workspaces to manage multiple environments.
// main.tf - Workspace-aware configuration
locals {
environment = terraform.workspace
instance_type = {
dev = "t3.micro"
staging = "t3.small"
prod = "m5.large"
}
replica_count = {
dev = 1
staging = 2
prod = 3
}
}
resource "aws_instance" "app" {
count = local.replica_count[local.environment]
ami = data.aws_ami.amazon_linux.id
instance_type = local.instance_type[local.environment]
tags = {
Name = "${local.environment}-app-${count.index + 1}"
Environment = local.environment
}
}
// Backend configuration for state isolation
terraform {
backend "s3" {
bucket = "my-terraform-state"
key = "infrastructure/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-locks"
encrypt = true
// Workspace-specific state files
// s3://my-terraform-state/env:/prod/infrastructure/terraform.tfstate
}
}
Pattern 4: Drift Detection and Remediation
Detect and fix configuration drift.
# drift_detection.py
import boto3
import json
from typing import Dict, List
class DriftDetector:
def __init__(self):
self.cf = boto3.client('cloudformation')
self.ec2 = boto3.client('ec2')
def detect_drift(self, stack_name: str) -> Dict:
# Initiate drift detection
response = self.cf.detect_stack_drift(
StackName=stack_name
)
drift_detection_id = response['StackDriftDetectionId']
# Wait for detection to complete
while True:
result = self.cf.describe_stack_drift_detection_status(
StackDriftDetectionId=drift_detection_id
)
if result['DetectionStatus'] == 'DETECTION_COMPLETE':
break
# Get detailed drift information
drifted_resources = []
if result['StackDriftStatus'] == 'DRIFTED':
resources = self.cf.describe_stack_resource_drifts(
StackName=stack_name,
StackResourceDriftStatusFilters=['MODIFIED', 'DELETED']
)
for resource in resources['StackResourceDrifts']:
drifted_resources.append({
'resource': resource['LogicalResourceId'],
'type': resource['ResourceType'],
'status': resource['StackResourceDriftStatus'],
'details': resource.get('PropertyDifferences', []),
})
return {
'stack': stack_name,
'drifted': result['StackDriftStatus'] == 'DRIFTED',
'resources': drifted_resources,
}
def auto_remediate(self, stack_name: str):
"""Revert drifted resources to stack definition."""
drift = self.detect_drift(stack_name)
if drift['drifted']:
print(f"Drift detected in {stack_name}:")
for resource in drift['resources']:
print(f" - {resource['resource']}: {resource['status']}")
# Update stack to revert drift
self.cf.update_stack(
StackName=stack_name,
UsePreviousTemplate=True,
capabilities=['CAPABILITY_IAM', 'CAPABILITY_NAMED_IAM'],
)
print(f"Initiated stack update to remediate drift")
ℹ️
Schedule drift detection weekly for critical infrastructure. Use EventBridge rules to trigger detection automatically.
Pattern 5: Terraform Plan and Apply Pipeline
Automate IaC with CI/CD.
# .github/workflows/terraform.yml
name: Terraform
on:
push:
branches: [main]
paths: ['infrastructure/**']
pull_request:
branches: [main]
paths: ['infrastructure/**']
env:
TF_VERSION: '1.6.0'
AWS_REGION: 'us-east-1'
jobs:
terraform:
name: Terraform ${{ matrix.command }}
runs-on: ubuntu-latest
strategy:
matrix:
environment: [dev, staging, prod]
command: [plan, apply]
exclude:
- command: apply
environment: dev
defaults:
run:
working-directory: infrastructure/environments/${{ matrix.environment }}
steps:
- uses: actions/checkout@v4
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
with:
terraform_version: ${{ env.TF_VERSION }}
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
aws-region: ${{ env.AWS_REGION }}
- name: Terraform Init
run: terraform init
- name: Terraform Plan
id: plan
run: terraform plan -no-color -out=tfplan
continue-on-error: true
- name: Terraform Apply
if: matrix.command == 'apply' && github.ref == 'refs/heads/main'
run: terraform apply -auto-approve tfplan
- name: Comment PR with Plan
if: matrix.command == 'plan' && github.event_name == 'pull_request'
uses: actions/github-script@v7
with:
script: |
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: `## Terraform Plan - ${{ matrix.environment }}\n\`\`\`\n${process.env.PLAN_OUTPUT}\n\`\`\``
})
IaC Best Practices
- Version Control - Store all IaC in Git with meaningful commit messages
- Module Reuse - Extract common patterns into reusable modules
- State Locking - Use DynamoDB for Terraform state locking
- Plan Before Apply - Always review plans before applying changes
- Immutable Infrastructure - Replace rather than modify resources
- Secrets Management - Never commit secrets; use external secret stores
Follow-Up Questions
- How do you manage Terraform state across multiple teams without conflicts?
- What strategies would you use to test infrastructure changes before applying them?
- How do you handle breaking changes in Terraform modules that are used by multiple environments?