πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

S3 Performance Optimization

AWS Data EngineeringMultipart Upload & Transfer Acceleration⭐ Premium

Advertisement

⚑ S3 Performance Optimization

Master multipart upload, transfer acceleration, and S3 request rate optimization.

Module: AWS Data Engineering β€’ Topic 36 of 65 β€’ Premium Content

S3 Performance Architecture

Architecture Diagram
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    S3 PERFORMANCE OPTIMIZATION                                β”‚
β”‚                                                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚  1. MULTIPART UPLOAD                                                 β”‚    β”‚
β”‚  β”‚     β€’ Split files >100MB into chunks                                 β”‚    β”‚
β”‚  β”‚     β€’ Upload in parallel (up to 10 concurrent)                      β”‚    β”‚
β”‚  β”‚     β€’ Retry individual parts on failure                              β”‚    β”‚
β”‚  β”‚     β€’ Required for files >5GB                                       β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚                                                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚  2. PREFIX PARALLELISM                                               β”‚    β”‚
β”‚  β”‚     β€’ 5,500 GET/HEAD requests per second per prefix                 β”‚    β”‚
β”‚  β”‚     β€’ 3,500 PUT/COPY/POST requests per second per prefix           β”‚    β”‚
β”‚  β”‚     β€’ Use multiple prefixes for high throughput                      β”‚    β”‚
β”‚  β”‚     β€’ Example: s3://bucket/{date}/{hour}/{partition}/               β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚                                                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚  3. TRANSFER ACCELERATION                                            β”‚    β”‚
β”‚  β”‚     β€’ Use CloudFront edge locations                                  β”‚    β”‚
β”‚  β”‚     β€’ Faster cross-region transfers                                  β”‚    β”‚
β”‚  β”‚     β€’ $0.04/GB + $0.04/1000 requests                                β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚                                                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚  4. S3 SELECT                                                        β”‚    β”‚
β”‚  β”‚     β€’ Filter at storage layer                                        β”‚    β”‚
β”‚  β”‚     β€’ Reduce data transferred                                        β”‚    β”‚
β”‚  β”‚     β€’ Supports Parquet, JSON, CSV                                    β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Multipart Upload Example

import boto3
from boto3.s3.transfer import TransferConfig
import os

s3 = boto3.client('s3')

# Configure multipart upload
config = TransferConfig(
    multipart_threshold=1024 * 1024 * 100,  # 100 MB
    max_concurrency=10,
    multipart_chunksize=1024 * 1024 * 100,  # 100 MB
    use_threads=True
)

# Upload with multipart
s3.upload_file(
    'large_file.parquet',
    'data-lake-bucket',
    'raw/data/file.parquet',
    Config=config
)

# Manual multipart for very large files
def multipart_upload(bucket, key, file_path, part_size=100*1024*1024):
    response = s3.create_multipart_upload(Bucket=bucket, Key=key)
    upload_id = response['UploadId']
    parts = []
    file_size = os.path.getsize(file_path)
    
    with open(file_path, 'rb') as f:
        part_number = 1
        while True:
            data = f.read(part_size)
            if not data:
                break
            response = s3.upload_part(
                Bucket=bucket, Key=key,
                PartNumber=part_number, UploadId=upload_id, Body=data
            )
            parts.append({'PartNumber': part_number, 'ETag': response['ETag']})
            part_number += 1
    
    s3.complete_multipart_upload(
        Bucket=bucket, Key=key, UploadId=upload_id,
        MultipartUpload={'Parts': parts}
    )

Interview Q&A

Q1: When should you use multipart upload?

Answer: For files >100MB. It improves throughput, allows parallel uploads, and provides resiliency by retrying individual parts.

Q2: How does prefix parallelism work?

Answer: S3 scales requests at the prefix level. Using multiple prefixes (e.g., date partitions) allows parallel request processing up to the per-prefix limits.

Q3: What is the benefit of S3 Select?

Answer: S3 Select filters data at the storage layer, reducing the amount of data transferred. Can reduce costs by up to 80% for queries on large objects.

Summary

  • Multipart Upload: Use for files >100MB, parallel part uploads
  • Prefix Parallelism: 5,500 req/s per prefix for reads
  • Transfer Acceleration: CloudFront-based for cross-region
  • S3 Select: Filter at storage layer, reduce data transfer
  • Connection Pooling: Reuse connections for high throughput

Advertisement