๐ŸŽ‰ 75% of content is free forever โ€” Unlock Premium from $10/mo โ†’
CW
Search coursesโ€ฆ
๐Ÿ’ผ Servicesโ„น๏ธ Aboutโœ‰๏ธ ContactView Pricing Plansfrom $10

IaC at Scale: Terraform Modules, State Management, Drift

Cloud ArchitectureInfrastructure as Codeโญ Premium

Advertisement

IaC at Scale: Terraform Modules, State Management, Drift

Difficulty: Senior Level | Companies: HashiCorp, AWS, Google, Microsoft, Spotify

Interview Question

"Design an Infrastructure as Code strategy for a multi-account AWS environment with 100+ Terraform configurations. How do you handle state management, drift detection, and module reuse?"

โ„น๏ธKey Concepts

This question tests your understanding of IaC best practices, Terraform patterns, and infrastructure governance.

Complete IaC Architecture

Architecture Overview

Architecture Diagram
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    INFRASTRUCTURE AS CODE ARCHITECTURE                   โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                          โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ SOURCE CONTROL โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                 โ”‚
โ”‚  โ”‚  GitHub/GitLab โ”‚ Branch Strategy โ”‚ Code Review     โ”‚                 โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                 โ”‚
โ”‚                         โ”‚                                               โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ MODULE LAYER โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                โ”‚
โ”‚  โ”‚                                                       โ”‚              โ”‚
โ”‚  โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”‚              โ”‚
โ”‚  โ”‚  โ”‚           Reusable Modules                   โ”‚    โ”‚              โ”‚
โ”‚  โ”‚  โ”‚                                               โ”‚    โ”‚              โ”‚
โ”‚  โ”‚  โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚    โ”‚              โ”‚
โ”‚  โ”‚  โ”‚  โ”‚   VPC    โ”‚  โ”‚   ECS    โ”‚  โ”‚   RDS    โ”‚  โ”‚    โ”‚              โ”‚
โ”‚  โ”‚  โ”‚  โ”‚  Module  โ”‚  โ”‚  Module  โ”‚  โ”‚  Module  โ”‚  โ”‚    โ”‚              โ”‚
โ”‚  โ”‚  โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚    โ”‚              โ”‚
โ”‚  โ”‚  โ”‚                                               โ”‚    โ”‚              โ”‚
โ”‚  โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ”‚              โ”‚
โ”‚  โ”‚                                                       โ”‚              โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜              โ”‚
โ”‚                         โ”‚                                               โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ STATE MANAGEMENT โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                โ”‚
โ”‚  โ”‚  S3 Backend โ”‚ State Locking โ”‚ Encryption            โ”‚                โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                 โ”‚
โ”‚                         โ”‚                                               โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ CI/CD INTEGRATION โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                โ”‚
โ”‚  โ”‚  Plan โ”‚ Apply โ”‚ Destroy โ”‚ Drift Detection          โ”‚                 โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜              โ”‚
โ”‚                                                                          โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Mathematical Foundation: IaC Metrics

Configuration Metrics:

  • Total configurations: C = 100
  • Average resources per config: R = 50
  • Total resources: T = C ร— R = 5,000
  • Average lines of code per config: L = 500
  • Total lines: L_total = C ร— L = 50,000

Drift Detection:

  • Drift detection frequency: F = daily
  • Average drift rate: D = 5%
  • Drifts per day: Dr = T ร— D = 250
  • Time to detect: T_detect = 24 hours
  • Time to remediate: T_remediate = 2 hours

Module Reuse:

  • Common modules: M = 20
  • Usage per module: U = 10
  • Total module usages: U_total = M ร— U = 200
  • Code reduction: R = (L_total - (M ร— 200)) / L_total = 60%

Terraform Module Structure

# modules/vpc/main.tf
terraform {
  required_version = ">= 1.0"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

variable "name" {
  description = "Name of the VPC"
  type        = string
}

variable "cidr_block" {
  description = "CIDR block for the VPC"
  type        = string
  default     = "10.0.0.0/16"
}

variable "azs" {
  description = "List of availability zones"
  type        = list(string)
}

variable "private_subnets" {
  description = "List of private subnet CIDRs"
  type        = list(string)
}

variable "public_subnets" {
  description = "List of public subnet CIDRs"
  type        = list(string)
}

variable "enable_nat_gateway" {
  description = "Enable NAT Gateway"
  type        = bool
  default     = true
}

variable "single_nat_gateway" {
  description = "Use a single NAT Gateway"
  type        = bool
  default     = false
}

resource "aws_vpc" "this" {
  cidr_block           = var.cidr_block
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name = var.name
  }
}

resource "aws_subnet" "private" {
  count = length(var.private_subnets)

  vpc_id            = aws_vpc.this.id
  cidr_block        = var.private_subnets[count.index]
  availability_zone = var.azs[count.index % length(var.azs)]

  tags = {
    Name = "${var.name}-private-${var.azs[count.index % length(var.azs)]}"
  }
}

resource "aws_subnet" "public" {
  count = length(var.public_subnets)

  vpc_id                  = aws_vpc.this.id
  cidr_block              = var.public_subnets[count.index]
  availability_zone       = var.azs[count.index % length(var.azs)]
  map_public_ip_on_launch = true

  tags = {
    Name = "${var.name}-public-${var.azs[count.index % length(var.azs)]}"
  }
}

resource "aws_internet_gateway" "this" {
  vpc_id = aws_vpc.this.id

  tags = {
    Name = var.name
  }
}

resource "aws_eip" "nat" {
  count = var.enable_nat_gateway ? (var.single_nat_gateway ? 1 : length(var.azs)) : 0

  domain = "vpc"

  tags = {
    Name = "${var.name}-nat-${count.index}"
  }
}

resource "aws_nat_gateway" "this" {
  count = var.enable_nat_gateway ? (var.single_nat_gateway ? 1 : length(var.azs)) : 0

  allocation_id = aws_eip.nat[count.index].id
  subnet_id     = aws_subnet.public[count.index].id

  tags = {
    Name = "${var.name}-nat-${count.index}"
  }

  depends_on = [aws_internet_gateway.this]
}

output "vpc_id" {
  description = "The ID of the VPC"
  value       = aws_vpc.this.id
}

output "private_subnet_ids" {
  description = "List of private subnet IDs"
  value       = aws_subnet.private[*].id
}

output "public_subnet_ids" {
  description = "List of public subnet IDs"
  value       = aws_subnet.public[*].id
}

output "nat_gateway_ips" {
  description = "List of Elastic IPs for NAT Gateways"
  value       = aws_eip.nat[*].public_ip
}
# modules/ecs/main.tf
variable "name" {
  description = "Name of the ECS cluster"
  type        = string
}

variable "vpc_id" {
  description = "VPC ID"
  type        = string
}

variable "subnet_ids" {
  description = "Subnet IDs for the ECS service"
  type        = list(string)
}

variable "container_image" {
  description = "Docker image for the container"
  type        = string
}

variable "container_port" {
  description = "Port exposed by the container"
  type        = number
  default     = 8080
}

variable "task_cpu" {
  description = "CPU units for the task"
  type        = number
  default     = 512
}

variable "task_memory" {
  description = "Memory for the task"
  type        = number
  default     = 1024
}

variable "desired_count" {
  description = "Number of instances to run"
  type        = number
  default     = 2
}

resource "aws_ecs_cluster" "this" {
  name = var.name

  setting {
    name  = "containerInsights"
    value = "enabled"
  }
}

resource "aws_ecs_task_definition" "this" {
  family                   = var.name
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = var.task_cpu
  memory                   = var.task_memory

  container_definitions = jsonencode([
    {
      name  = var.name
      image = var.container_image

      portMappings = [
        {
          containerPort = var.container_port
          hostPort      = var.container_port
          protocol      = "tcp"
        }
      ]

      logConfiguration = {
        logDriver = "awslogs"
        options = {
          "awslogs-group"         = aws_cloudwatch_log_group.this.name
          "awslogs-region"        = data.aws_region.current.name
          "awslogs-stream-prefix" = "ecs"
        }
      }
    }
  ])
}

resource "aws_ecs_service" "this" {
  name            = var.name
  cluster         = aws_ecs_cluster.this.id
  task_definition = aws_ecs_task_definition.this.arn
  desired_count   = var.desired_count
  launch_type     = "FARGATE"

  network_configuration {
    subnets          = var.subnet_ids
    security_groups  = [aws_security_group.this.id]
    assign_public_ip = false
  }
}

resource "aws_security_group" "this" {
  name_prefix = "${var.name}-"
  vpc_id      = var.vpc_id

  ingress {
    from_port   = var.container_port
    to_port     = var.container_port
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

resource "aws_cloudwatch_log_group" "this" {
  name              = "/ecs/${var.name}"
  retention_in_days = 30
}

output "cluster_id" {
  description = "ECS cluster ID"
  value       = aws_ecs_cluster.this.id
}

output "service_name" {
  description = "ECS service name"
  value       = aws_ecs_service.this.name
}

State Management

# backend.tf
terraform {
  backend "s3" {
    bucket         = "terraform-state-bucket"
    key            = "prod/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}

# state locking with DynamoDB
resource "aws_dynamodb_table" "terraform_locks" {
  name         = "terraform-locks"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }
}
# State management automation
import boto3
import json
from typing import Dict, Any, List
from dataclasses import dataclass
from datetime import datetime

@dataclass
class TerraformState:
    state_file: str
    backend: str
    last_modified: datetime
    locked: bool

class StateManager:
    """Terraform state management"""

    def __init__(self, bucket_name: str, state_key: str):
        self.s3 = boto3.client('s3')
        self.dynamodb = boto3.resource('dynamodb')
        self.bucket_name = bucket_name
        self.state_key = state_key
        self.lock_table = self.dynamodb.Table('terraform-locks')

    def get_state(self) -> Dict[str, Any]:
        """Get Terraform state"""
        response = self.s3.get_object(
            Bucket=self.bucket_name,
            Key=self.state_key
        )
        return json.loads(response['Body'].read())

    def update_state(self, state: Dict[str, Any]):
        """Update Terraform state"""
        self.s3.put_object(
            Bucket=self.bucket_name,
            Key=self.state_key,
            Body=json.dumps(state, indent=2),
            ServerSideEncryption='aws:kms'
        )

    def lock_state(self, lock_id: str) -> bool:
        """Lock Terraform state"""
        try:
            self.lock_table.put_item(
                Item={
                    'LockID': self.state_key,
                    'lock_id': lock_id,
                    'created_at': datetime.utcnow().isoformat()
                },
                ConditionExpression='attribute_not_exists(LockID)'
            )
            return True
        except Exception as e:
            print(f"Lock failed: {e}")
            return False

    def unlock_state(self, lock_id: str) -> bool:
        """Unlock Terraform state"""
        try:
            self.lock_table.delete_item(
                Key={'LockID': self.state_key},
                ConditionExpression='lock_id = :lock_id',
                ExpressionAttributeValues={':lock_id': lock_id}
            )
            return True
        except Exception as e:
            print(f"Unlock failed: {e}")
            return False

    def detect_drift(self, current_resources: List[str]) -> Dict[str, Any]:
        """Detect drift between state and actual resources"""
        state = self.get_state()
        state_resources = [r['address'] for r in state.get('resources', [])]

        added = [r for r in current_resources if r not in state_resources]
        removed = [r for r in state_resources if r not in current_resources]

        return {
            'drifted': len(added) > 0 or len(removed) > 0,
            'added': added,
            'removed': removed
        }

    def backup_state(self):
        """Backup Terraform state"""
        timestamp = datetime.utcnow().strftime('%Y%m%d-%H%M%S')
        backup_key = f"backups/{self.state_key}.{timestamp}"

        self.s3.copy_object(
            Bucket=self.bucket_name,
            CopySource={'Bucket': self.bucket_name, 'Key': self.state_key},
            Key=backup_key,
            ServerSideEncryption='aws:kms'
        )

        return backup_key

Drift Detection

# Drift detection automation
import boto3
import json
from typing import Dict, Any, List
from datetime import datetime

class DriftDetector:
    """Infrastructure drift detection"""

    def __init__(self):
        self.ec2 = boto3.client('ec2')
        self.rds = boto3.client('rds')
        self.s3 = boto3.client('s3')

    def detect_ec2_drift(self, expected_instances: List[Dict[str, Any]]) -> Dict[str, Any]:
        """Detect EC2 instance drift"""
        # Get actual instances
        response = self.ec2.describe_instances()
        actual_instances = []

        for reservation in response['Reservations']:
            for instance in reservation['Instances']:
                actual_instances.append({
                    'instance_id': instance['InstanceId'],
                    'instance_type': instance['InstanceType'],
                    'state': instance['State']['Name']
                })

        # Compare
        expected_ids = {i['instance_id'] for i in expected_instances}
        actual_ids = {i['instance_id'] for i in actual_instances}

        added = actual_ids - expected_ids
        removed = expected_ids - actual_ids

        return {
            'drifted': len(added) > 0 or len(removed) > 0,
            'added': list(added),
            'removed': list(removed),
            'details': {
                'expected_count': len(expected_instances),
                'actual_count': len(actual_instances)
            }
        }

    def detect_s3_drift(self, expected_buckets: List[str]) -> Dict[str, Any]:
        """Detect S3 bucket drift"""
        response = self.s3.list_buckets()
        actual_buckets = [b['Name'] for b in response['Buckets']]

        expected_set = set(expected_buckets)
        actual_set = set(actual_buckets)

        added = actual_set - expected_set
        removed = expected_set - actual_set

        return {
            'drifted': len(added) > 0 or len(removed) > 0,
            'added': list(added),
            'removed': list(removed)
        }

    def generate_drift_report(self, drift_results: List[Dict[str, Any]]) -> str:
        """Generate drift report"""
        report = "# Infrastructure Drift Report\n\n"
        report += f"Generated: {datetime.utcnow().isoformat()}\n\n"

        for result in drift_results:
            report += f"## {result['type']}\n"
            report += f"Drifted: {result['drifted']}\n"
            if result['drifted']:
                report += f"Added: {result.get('added', [])}\n"
                report += f"Removed: {result.get('removed', [])}\n"
            report += "\n"

        return report

โš ๏ธDrift Detection

Implement automated drift detection to identify unauthorized changes. Use scheduled scans and alerts to maintain infrastructure compliance.

CI/CD Integration

# GitHub Actions for Terraform
name: Terraform CI/CD

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

env:
  TF_VERSION: "1.5.0"
  AWS_REGION: "us-east-1"

jobs:
  plan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - uses: hashicorp/setup-terraform@v2
        with:
          terraform_version: ${{ env.TF_VERSION }}

      - name: Terraform Init
        run: terraform init

      - name: Terraform Format
        run: terraform fmt -check

      - name: Terraform Validate
        run: terraform validate

      - name: Terraform Plan
        run: terraform plan -out=tfplan

      - name: Upload Plan
        uses: actions/upload-artifact@v3
        with:
          name: tfplan
          path: tfplan

  apply:
    needs: plan
    runs-on: ubuntu-latest
    environment: production
    steps:
      - uses: actions/checkout@v3

      - uses: hashicorp/setup-terraform@v2
        with:
          terraform_version: ${{ env.TF_VERSION }}

      - name: Download Plan
        uses: actions/download-artifact@v3
        with:
          name: tfplan

      - name: Terraform Init
        run: terraform init

      - name: Terraform Apply
        run: terraform apply -auto-approve tfplan

  drift-detection:
    runs-on: ubuntu-latest
    schedule:
      - cron: '0 0 * * *'  # Daily
    steps:
      - uses: actions/checkout@v3

      - uses: hashicorp/setup-terraform@v2
        with:
          terraform_version: ${{ env.TF_VERSION }}

      - name: Terraform Init
        run: terraform init

      - name: Check for Drift
        run: |
          terraform plan -detailed-exitcode
          if [ $? -eq 2 ]; then
            echo "Drift detected!"
            # Send notification
          fi

โœ…IaC Benefits

Infrastructure as Code provides version control, reproducibility, and automation for infrastructure. Use modules for reuse and state management for tracking.

Summary

ComponentPurposeImplementation
ModulesReusable componentsTerraform modules
StateInfrastructure stateS3 backend
LockingConcurrency controlDynamoDB
DriftChange detectionScheduled scans
CI/CDAutomationGitHub Actions

Advertisement