π Data Migration on AWS
Master DMS, Snowball, Transfer Family, and migration strategies.
Module: AWS Data Engineering β’ Topic 33 of 65 β’ Premium Content
Migration Strategies
Architecture Diagram
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DATA MIGRATION STRATEGIES β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β 1. DMS (Database Migration Service) β β
β β Small-medium databases (<10 TB) β β
β β Continuous replication (CDC) β β
β β Source: RDS, MySQL, PostgreSQL, Oracle β β
β β Target: RDS, Redshift, S3, Aurora β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β 2. SNOWBALL (Physical Device) β β
β β Large datasets (>10 TB) β β
β β Network-bound transfers β β
β β Snowball Edge: 80 TB / Snowball: 80 TB β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β 3. TRANSFER FAMILY (SFTP/FTPS/FTP) β β
β β Replace on-premises SFTP servers β β
β β Managed file transfer β β
β β Integration with S3 β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β 4. DATA SYNC (Large-scale file sync) β β
β β On-premises to S3 β β
β β S3 to S3 cross-region β β
β β Automatic encryption and compression β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
DMS Configuration
import boto3
dms = boto3.client('dms')
# Create replication instance
response = dms.create_replication_instance(
ReplicationInstanceIdentifier='migration-instance',
ReplicationInstanceClass='dms.r5.xlarge',
AllocatedStorage=100,
MultiAZ=True,
EngineVersion='3.5.1'
)
# Create source endpoint (on-premises Oracle)
source = dms.create_endpoint(
EndpointIdentifier='onprem-oracle',
EndpointType='source',
EngineName='oracle',
ServerName='onprem-db.company.com',
Port=1521,
DatabaseName='PROD',
Username='dms_user',
Password='SecurePassword',
SslMode='require'
)
# Create target endpoint (Aurora PostgreSQL)
target = dms.create_endpoint(
EndpointIdentifier='aurora-postgres',
EndpointType='target',
EngineName='postgres',
ServerName='aurora.cluster-123.us-east-1.rds.amazonaws.com',
Port=5432,
DatabaseName='production',
Username='dms_user',
Password='SecurePassword'
)
# Create full load + CDC task
task = dms.create_replication_task(
ReplicationTaskIdentifier='oracle-to-aurora',
SourceEndpointArn=source['Endpoint']['EndpointArn'],
TargetEndpointArn=target['Endpoint']['EndpointArn'],
ReplicationInstanceArn=response['ReplicationInstance']['ReplicationInstanceArn'],
MigrationType='full-load-and-cdc',
TableMappings='''{
"rules": [{
"rule-type": "selection",
"rule-id": "1",
"rule-name": "include-all",
"object-locator": {
"schema-name": "PROD",
"table-name": "%"
},
"rule-action": "include"
}]
}'''
)
Snowball Usage
import boto3
snowball = boto3.client('snowball')
# Create Snowball job
response = snowball.create_job(
JobType='IMPORT',
Resources={
'LambdaResources': [],
'S3Resources': [
{
'BucketArn': 'arn:aws:s3:::migration-bucket',
'KeyRange': {
'BeginMarker': 'data/',
'EndMarker': 'data/z'
}
}
]
},
Address={
'Name': 'Data Center',
'AddressLine1': '123 Tech St',
'City': 'Seattle',
'State': 'WA',
'PostalCode': '98101',
'Country': 'US'
},
ShippingOption='EXPEDITED',
Notification={
'SnsTopicARN': 'arn:aws:sns:us-east-1:123456789012:migration-alerts',
'EventTypes': ['JobCompleted', 'JobFailed']
}
)
Interview Q&A
Q1: When to use DMS vs Snowball?
Answer: DMS for database migrations (<10 TB) with ongoing replication. Snowball for massive datasets (>10 TB) where network transfer is impractical.
Q2: What is DMS CDC?
Answer: Change Data Capture captures ongoing changes from source databases using transaction logs (binlog, WAL), enabling continuous replication.
Q3: How does Transfer Family work?
Answer: Transfer Family provides managed SFTP/FTPS endpoints backed by S3. Users connect via standard protocols; files land in S3.
Summary
- DMS: Database migration with CDC support
- Snowball: Physical device for large-scale data transfer
- Transfer Family: Managed SFTP/FTPS backed by S3
- DataSync: Automated file synchronization
- Strategy: Choose based on data size and transfer requirements