Cloud Cost Optimization: Reserved, Spot, Right-Sizing

Difficulty: Senior Level | Companies: Netflix, Amazon, Google, Microsoft, FinOps Foundation

Interview Question

"Design a cost optimization strategy for a cloud environment with $1M+ monthly spend. How do you handle reserved instances, spot instances, and right-sizing?"

ℹ️Key Concepts

This question tests your understanding of cloud economics, FinOps practices, and cost optimization strategies.

Complete Cost Optimization Architecture

Architecture Overview

Architecture Diagram

┌─────────────────────────────────────────────────────────────────────────┐
│                    COST OPTIMIZATION ARCHITECTURE                        │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌───────────────── VISIBILITY LAYER ─────────────────┐                │
│  │  Cost Explorer │ Budgets │ Pricing Calculator       │                 │
│  └──────────────────────┬──────────────────────────┘                 │
│                         │                                               │
│  ┌───────────────── OPTIMIZATION LAYER ───────────────┐                │
│  │                                                       │              │
│  │  ┌─────────────────────────────────────────────┐    │              │
│  │  │           Right-Sizing                       │    │              │
│  │  │  (CPU/Memory optimization)                  │    │              │
│  │  └─────────────────────────────────────────────┘    │              │
│  │                                                       │              │
│  │  ┌─────────────────────────────────────────────┐    │              │
│  │  │           Reserved Instances                 │    │              │
│  │  │  (1yr/3yr commitments)                      │    │              │
│  │  └─────────────────────────────────────────────┘    │              │
│  │                                                       │              │
│  │  ┌─────────────────────────────────────────────┐    │              │
│  │  │           Spot Instances                     │    │              │
│  │  │  (Fault-tolerant workloads)                 │    │              │
│  │  └─────────────────────────────────────────────┘    │              │
│  │                                                       │              │
│  │  ┌─────────────────────────────────────────────┐    │              │
│  │  │           Scheduling                        │    │              │
│  │  │  (Start/stop non-production)                │    │              │
│  │  └─────────────────────────────────────────────┘    │              │
│  │                                                       │              │
│  └──────────────────────┬──────────────────────────────┘              │
│                         │                                               │
│  ┌───────────────── GOVERNANCE LAYER ─────────────────┐               │
│  │  Tagging │ Policies │ Alerts │ Reports             │                 │
│  └─────────────────────────────────────────────────────┘              │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Mathematical Foundation: Cost Models

Reserved Instance Savings:

On-demand price: P_od = $0.10/hour
Reserved price (1yr): P_1yr = $0.06/hour (40% savings)
Reserved price (3yr): P_3yr = $0.04/hour (60% savings)
Monthly savings (1yr): S_1yr = (P_od - P_1yr) × 730 hours = $29.20/instance
Monthly savings (3yr): S_3yr = (P_od - P_3yr) × 730 hours = $43.80/instance

Spot Instance Savings:

On-demand price: P_od = $0.10/hour
Spot price: P_spot = $0.03/hour (70% savings)
Monthly savings: S_spot = (P_od - P_spot) × 730 hours = $51.10/instance

Right-Sizing Impact:

Current instance: m5.4xlarge (16 vCPU, 64GB) = $0.768/hour
Right-sized: m5.xlarge (4 vCPU, 16GB) = $0.192/hour
Monthly savings: S_resize = (0.768 - 0.192) × 730 = $420.48/instance

Total Cost Optimization:

Current monthly cost: C_current = $1,000,000
Right-sizing savings (20%): $200,000
Reserved instances (30%): $300,000
Spot instances (15%): $150,000
Scheduling (10%): $100,000
Total optimized cost: C_optimized = $250,000 (75% reduction)

AWS Cost Explorer Integration

# Cost monitoring and optimization
import boto3
from typing import Dict, Any, List
from datetime import datetime, timedelta
from dataclasses import dataclass

@dataclass
class CostRecommendation:
    resource_id: str
    resource_type: str
    current_cost: float
    recommended_action: str
    potential_savings: float
    confidence: str

class CostOptimizer:
    """Cloud cost optimization manager"""

    def __init__(self):
        self.ce = boto3.client('ce')
        self.ec2 = boto3.client('ec2')
        self.rds = boto3.client('rds')

    def get_cost_and_usage(self, days: int = 30) -> Dict[str, Any]:
        """Get cost and usage report"""
        end_date = datetime.utcnow().strftime('%Y-%m-%d')
        start_date = (datetime.utcnow() - timedelta(days=days)).strftime('%Y-%m-%d')

        response = self.ce.get_cost_and_usage(
            TimePeriod={
                'Start': start_date,
                'End': end_date
            },
            Granularity='MONTHLY',
            Metrics=['UnblendedCost', 'UsageQuantity'],
            GroupBy=[
                {
                    'Type': 'DIMENSION',
                    'Key': 'SERVICE'
                }
            ]
        )

        return response

    def get_right_sizing_recommendations(self) -> List[CostRecommendation]:
        """Get right-sizing recommendations"""
        response = self.ce.get_recommendations(
            AccountScope='PAYER',
            LookBackPeriodInDays=14,
            TermInDays=30,
            Module='EC2_INSTANCE'
        )

        recommendations = []
        for rec in response['InstanceRecommendations']:
            recommendations.append(CostRecommendation(
                resource_id=rec['ResourceId'],
                resource_type='EC2',
                current_cost=rec['CurrentInstance']['HourlyCost'],
                recommended_action=rec['RecommendedOption'],
                potential_savings=rec['SavingsOpportunity']['SavingsPercent'],
                confidence=rec['Confidence']
            ))

        return recommendations

    def get_unused_resources(self) -> List[Dict[str, Any]]:
        """Get unused resources"""
        unused_resources = []

        # Check unused EBS volumes
        volumes = self.ec2.describe_volumes(
            Filters=[{'Name': 'status', 'Values': ['available']}]
        )

        for volume in volumes['Volumes']:
            unused_resources.append({
                'resource_id': volume['VolumeId'],
                'resource_type': 'EBS Volume',
                'size_gb': volume['Size'],
                'monthly_cost': volume['Size'] * 0.10  # $0.10/GB/month
            })

        # Check unused Elastic IPs
        addresses = self.ec2.describe_addresses()

        for address in addresses['Addresses']:
            if 'InstanceId' not in address:
                unused_resources.append({
                    'resource_id': address['AllocationId'],
                    'resource_type': 'Elastic IP',
                    'monthly_cost': 3.60  # $3.60/month for unused IP
                })

        return unused_resources

    def get_savings_plans_coverage(self) -> Dict[str, Any]:
        """Get Savings Plans coverage"""
        response = self.ce.get_savings_plans_coverage(
            TimePeriod={
                'Start': (datetime.utcnow() - timedelta(days=30)).strftime('%Y-%m-%d'),
                'End': datetime.utcnow().strftime('%Y-%m-%d')
            },
            GroupBy=[
                {
                    'Type': 'DIMENSION',
                    'Key': 'INSTANCE_TYPE_FAMILY'
                }
            ]
        )

        return response

Right-Sizing Implementation

# Right-sizing analysis
import boto3
from typing import Dict, Any, List
from dataclasses import dataclass
from datetime import datetime, timedelta

@dataclass
class RightSizeRecommendation:
    instance_id: str
    current_type: str
    recommended_type: str
    current_cost: float
    recommended_cost: float
    cpu_utilization: float
    memory_utilization: float

class RightSizer:
    """Instance right-sizing analyzer"""

    def __init__(self):
        self.cloudwatch = boto3.client('cloudwatch')
        self.ec2 = boto3.client('ec2')

    def analyze_instances(self) -> List[RightSizeRecommendation]:
        """Analyze all instances for right-sizing"""
        recommendations = []

        # Get all running instances
        instances = self.ec2.describe_instances(
            Filters=[{'Name': 'instance-state-name', 'Values': ['running']}]
        )

        for reservation in instances['Reservations']:
            for instance in reservation['Instances']:
                recommendation = self._analyze_instance(instance)
                if recommendation:
                    recommendations.append(recommendation)

        return recommendations

    def _analyze_instance(self, instance: Dict[str, Any]) -> RightSizeRecommendation:
        """Analyze single instance"""
        instance_id = instance['InstanceId']
        instance_type = instance['InstanceType']

        # Get CPU utilization
        cpu_util = self._get_cpu_utilization(instance_id)

        # Get memory utilization (requires CloudWatch agent)
        memory_util = self._get_memory_utilization(instance_id)

        # Determine if right-sizing is needed
        if cpu_util < 20 and memory_util < 30:
            recommended_type = self._recommend_smaller_instance(instance_type)
            if recommended_type != instance_type:
                return RightSizeRecommendation(
                    instance_id=instance_id,
                    current_type=instance_type,
                    recommended_type=recommended_type,
                    current_cost=self._get_instance_cost(instance_type),
                    recommended_cost=self._get_instance_cost(recommended_type),
                    cpu_utilization=cpu_util,
                    memory_utilization=memory_util
                )

        return None

    def _get_cpu_utilization(self, instance_id: str) -> float:
        """Get average CPU utilization"""
        response = self.cloudwatch.get_metric_statistics(
            Namespace='AWS/EC2',
            MetricName='CPUUtilization',
            Dimensions=[
                {
                    'Name': 'InstanceId',
                    'Value': instance_id
                }
            ],
            StartTime=datetime.utcnow() - timedelta(days=14),
            EndTime=datetime.utcnow(),
            Period=86400,
            Statistics=['Average']
        )

        if response['Datapoints']:
            return sum(point['Average'] for point in response['Datapoints']) / \
                   len(response['Datapoints'])
        return 0.0

    def _get_memory_utilization(self, instance_id: str) -> float:
        """Get memory utilization (requires CloudWatch agent)"""
        # In production, this requires CloudWatch agent installed
        return 0.0

    def _recommend_smaller_instance(self, current_type: str) -> str:
        """Recommend smaller instance type"""
        # Instance family mapping
        smaller_instances = {
            'm5.4xlarge': 'm5.xlarge',
            'm5.2xlarge': 'm5.large',
            'c5.4xlarge': 'c5.xlarge',
            'c5.2xlarge': 'c5.large',
            'r5.4xlarge': 'r5.xlarge',
            'r5.2xlarge': 'r5.large'
        }

        return smaller_instances.get(current_type, current_type)

    def _get_instance_cost(self, instance_type: str) -> float:
        """Get instance hourly cost"""
        # Pricing data (simplified)
        pricing = {
            'm5.xlarge': 0.192,
            'm5.large': 0.096,
            'c5.xlarge': 0.17,
            'c5.large': 0.085,
            'r5.xlarge': 0.252,
            'r5.large': 0.126,
            'm5.2xlarge': 0.384,
            'm5.4xlarge': 0.768,
            'c5.2xlarge': 0.34,
            'c5.4xlarge': 0.68,
            'r5.2xlarge': 0.504,
            'r5.4xlarge': 1.008
        }

        return pricing.get(instance_type, 0.0)

    def calculate_savings(self, recommendations: List[RightSizeRecommendation]) -> float:
        """Calculate total potential savings"""
        total_savings = 0
        for rec in recommendations:
            monthly_savings = (rec.current_cost - rec.recommended_cost) * 730
            total_savings += monthly_savings
        return total_savings

Reserved Instance Management

# Reserved Instance management
import boto3
from typing import Dict, Any, List
from datetime import datetime, timedelta
from dataclasses import dataclass

@dataclass
class RIRecommendation:
    instance_type: str
    region: str
    term: str  # 1yr or 3yr
    payment_option: str  # AllUpfront, PartialUpfront, NoUpfront
    coverage: float
    monthly_savings: float

class RIManager:
    """Reserved Instance manager"""

    def __init__(self):
        self.ce = boto3.client('ce')
        self.ec2 = boto3.client('ec2')

    def get_ri_recommendations(self) -> List[RIRecommendation]:
        """Get RI recommendations"""
        response = self.ce.get_reservation_purchase_recommendation(
            LookBackPeriodInDays=30,
            AccountScope='PAYER',
            Service='Amazon Elastic Compute Cloud - Compute',
            PaymentOption='ALL_UPFRONT',
            TermInYears='ONE_YEAR'
        )

        recommendations = []
        for rec in response['Recommendations']:
            recommendations.append(RIRecommendation(
                instance_type=rec['InstanceDetails']['EC2InstanceDetails']['InstanceType'],
                region=rec['InstanceDetails']['EC2InstanceDetails']['Region'],
                term='1yr',
                payment_option='AllUpfront',
                coverage=rec['RecommendationDetails']['CoveragePercentage'],
                monthly_savings=float(rec['RecommendationDetails']['MonthlyRecommendationSavings'])
            ))

        return recommendations

    def get_current_ri_utilization(self) -> Dict[str, Any]:
        """Get RI utilization report"""
        response = self.ce.get_reservation_utilization(
            TimePeriod={
                'Start': (datetime.utcnow() - timedelta(days=30)).strftime('%Y-%m-%d'),
                'End': datetime.utcnow().strftime('%Y-%m-%d')
            }
        )

        return response

    def get_ri_coverage(self) -> Dict[str, Any]:
        """Get RI coverage report"""
        response = self.ce.get_reservation_coverage(
            TimePeriod={
                'Start': (datetime.utcnow() - timedelta(days=30)).strftime('%Y-%m-%d'),
                'End': datetime.utcnow().strftime('%Y-%m-%d')
            },
            GroupBy=[
                {
                    'Type': 'DIMENSION',
                    'Key': 'INSTANCE_TYPE_FAMILY'
                }
            ]
        )

        return response

    def calculate_ri_savings(self, instance_type: str, count: int, 
                            term: str = '1yr') -> float:
        """Calculate RI savings"""
        # Pricing data
        pricing = {
            'm5.large': {'od': 0.096, 'ri_1yr': 0.058, 'ri_3yr': 0.038},
            'm5.xlarge': {'od': 0.192, 'ri_1yr': 0.116, 'ri_3yr': 0.076},
            'c5.large': {'od': 0.085, 'ri_1yr': 0.051, 'ri_3yr': 0.033},
            'c5.xlarge': {'od': 0.17, 'ri_1yr': 0.102, 'ri_3yr': 0.066}
        }

        if instance_type not in pricing:
            return 0.0

        instance_pricing = pricing[instance_type]
        od_cost = instance_pricing['od'] * 730 * count
        ri_cost = instance_pricing[f'ri_{term}'] * 730 * count

        return od_cost - ri_cost

Spot Instance Strategy

# Spot instance management
import boto3
from typing import Dict, Any, List
from datetime import datetime, timedelta
from dataclasses import dataclass

@dataclass
class SpotRecommendation:
    instance_type: str
    spot_price: float
    on_demand_price: float
    savings_percent: float
    interruption_frequency: str

class SpotManager:
    """Spot instance manager"""

    def __init__(self):
        self.ec2 = boto3.client('ec2')
        self.pricing = boto3.client('pricing', region_name='us-east-1')

    def get_spot_price_history(self, instance_type: str = None) -> List[Dict[str, Any]]:
        """Get spot price history"""
        response = self.ec2.describe_spot_price_history(
            InstanceTypes=[instance_type] if instance_type else [],
            StartTime=datetime.utcnow() - timedelta(days=7),
            ProductDescriptions=['Linux/UNIX']
        )

        return response['SpotPriceHistory']

    def get_spot_recommendations(self) -> List[SpotRecommendation]:
        """Get spot instance recommendations"""
        recommendations = []

        # Analyze different instance types
        instance_types = ['m5.large', 'm5.xlarge', 'c5.large', 'c5.xlarge']

        for instance_type in instance_types:
            spot_prices = self.get_spot_price_history(instance_type)
            if spot_prices:
                avg_spot_price = sum(float(p['SpotPrice']) for p in spot_prices) / len(spot_prices)
                on_demand_price = self._get_on_demand_price(instance_type)

                if on_demand_price > 0:
                    savings_percent = ((on_demand_price - avg_spot_price) / on_demand_price) * 100

                    recommendations.append(SpotRecommendation(
                        instance_type=instance_type,
                        spot_price=avg_spot_price,
                        on_demand_price=on_demand_price,
                        savings_percent=savings_percent,
                        interruption_frequency='low'  # Simplified
                    ))

        return recommendations

    def _get_on_demand_price(self, instance_type: str) -> float:
        """Get on-demand price for instance type"""
        # Pricing data (simplified)
        pricing = {
            'm5.large': 0.096,
            'm5.xlarge': 0.192,
            'c5.large': 0.085,
            'c5.xlarge': 0.17
        }

        return pricing.get(instance_type, 0.0)

    def calculate_spot_savings(self, recommendations: List[SpotRecommendation]) -> float:
        """Calculate total spot savings"""
        total_savings = 0
        for rec in recommendations:
            monthly_savings = (rec.on_demand_price - rec.spot_price) * 730
            total_savings += monthly_savings
        return total_savings

⚠️Spot Instance Strategy

Use spot instances for fault-tolerant, flexible workloads. Implement proper interruption handling and use multiple instance types for availability.

Scheduling Optimization

# Scheduling for cost optimization
import boto3
from typing import Dict, Any, List
from datetime import datetime, timedelta

class ScheduleManager:
    """Resource scheduling manager"""

    def __init__(self):
        self.ec2 = boto3.client('ec2')
        self.rds = boto3.client('rds')

    def schedule_non_production(self):
        """Schedule non-production resources"""
        # Get non-production instances
        instances = self.ec2.describe_instances(
            Filters=[
                {'Name': 'tag:Environment', 'Values': ['staging', 'dev']},
                {'Name': 'instance-state-name', 'Values': ['running']}
            ]
        )

        for reservation in instances['Reservations']:
            for instance in reservation['Instances']:
                # Check if it's outside business hours
                if self._is_outside_business_hours():
                    self._stop_instance(instance['InstanceId'])

    def schedule_development_databases(self):
        """Schedule development RDS instances"""
        # Get development RDS instances
        dbs = self.rds.describe_db_instances(
            Filters=[
                {'Name': 'tag:Environment', 'Values': ['dev', 'staging']}
            ]
        )

        for db in dbs['DBInstances']:
            if self._is_outside_business_hours():
                self._stop_rds_instance(db['DBInstanceIdentifier'])

    def _is_outside_business_hours(self) -> bool:
        """Check if current time is outside business hours"""
        current_hour = datetime.utcnow().hour
        # Business hours: 9 AM - 6 PM UTC
        return not (9 <= current_hour <= 18)

    def _stop_instance(self, instance_id: str):
        """Stop EC2 instance"""
        self.ec2.stop_instances(InstanceIds=[instance_id])

    def _stop_rds_instance(self, db_instance_id: str):
        """Stop RDS instance"""
        self.rds.stop_db_instance(
            DBInstanceIdentifier=db_instance_id
        )

    def calculate_savings(self) -> float:
        """Calculate scheduling savings"""
        # Assume 100 non-production instances
        # Running 24/7 vs 9 hours/day, 5 days/week
        hours_per_week_full = 24 * 7  # 168 hours
        hours_per_week_business = 9 * 5  # 45 hours
        reduction = 1 - (hours_per_week_business / hours_per_week_full)

        # Average instance cost: $0.20/hour
        monthly_cost_full = 100 * 0.20 * 730  # $14,600
        monthly_cost_scheduled = monthly_cost_full * (1 - reduction)

        return monthly_cost_full - monthly_cost_scheduled

Cost Monitoring Dashboard

# Cost monitoring and alerting
import boto3
from typing import Dict, Any
from datetime import datetime, timedelta

class CostMonitor:
    """Cost monitoring and alerting"""

    def __init__(self):
        self.ce = boto3.client('ce')
        self.sns = boto3.client('sns')
        self.cloudwatch = boto3.client('cloudwatch')

    def create_cost_budget(self, budget_amount: float):
        """Create cost budget"""
        budgets = boto3.client('budgets')

        budgets.create_budget(
            AccountId='123456789012',
            Budget={
                'BudgetName': 'monthly-cost-budget',
                'BudgetLimit': {
                    'Amount': str(budget_amount),
                    'Unit': 'USD'
                },
                'TimeUnit': 'MONTHLY',
                'BudgetType': 'COST',
                'CostFilters': {
                    'TagKey': ['Environment'],
                    'TagValues': ['production']
                }
            },
            NotificationsWithSubscribers=[
                {
                    'Notification': {
                        'NotificationType': 'ACTUAL',
                        'ComparisonOperator': 'GREATER_THAN',
                        'Threshold': 80,
                        'ThresholdType': 'PERCENTAGE'
                    },
                    'Subscribers': [
                        {
                            'SubscriptionType': 'SNS',
                            'Address': 'arn:aws:sns:us-east-1:123456789012:cost-alerts'
                        }
                    ]
                }
            ]
        )

    def monitor_daily_costs(self):
        """Monitor daily costs"""
        end_date = datetime.utcnow().strftime('%Y-%m-%d')
        start_date = (datetime.utcnow() - timedelta(days=1)).strftime('%Y-%m-%d')

        response = self.ce.get_cost_and_usage(
            TimePeriod={
                'Start': start_date,
                'End': end_date
            },
            Granularity='DAILY',
            Metrics=['UnblendedCost']
        )

        total_cost = sum(
            float(result['Total']['UnblendedCost']['Amount'])
            for result in response['ResultsByTime']
        )

        # Alert if daily cost exceeds threshold
        if total_cost > 10000:  # $10,000 daily threshold
            self._send_cost_alert(total_cost)

        return total_cost

    def _send_cost_alert(self, cost: float):
        """Send cost alert"""
        self.sns.publish(
            TopicArn='arn:aws:sns:us-east-1:123456789012:cost-alerts',
            Message=f'Daily cost exceeded threshold: ${cost:.2f}',
            Subject='Cost Alert'
        )

✅Cost Optimization Benefits

Cost optimization can reduce cloud spend by 50-75%. Use a combination of right-sizing, reserved instances, spot instances, and scheduling for maximum savings.

Summary

Strategy	Savings	Risk Level	Complexity
Right-Sizing	20-30%	Low	Medium
Reserved Instances	40-60%	Medium	Low
Spot Instances	70-90%	High	High
Scheduling	60-70%	Low	Low
Storage Tiering	50-70%	Low	Medium