Azure Blob Storage: Tiers, Lifecycle & Hierarchical Namespace
Mastering Azure Blob Storage architecture, access tiers, lifecycle management, and ADLS Gen2 capabilities
Blob Storage Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AZURE BLOB STORAGE ARCHITECTURE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β STORAGE ACCOUNT β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Type: StorageV2 (General Purpose v2) β β
β β Redundancy: LRS / ZRS / GRS / RA-GRS / GZRS / RA-GZRS β β
β β Access Tier: Hot / Cool / Cold / Archive β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β CONTAINERS (Flat Namespace - Blob Storage) β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β β
β β β raw/ β β curated/ β β sandbox/ β β archive/ β β β
β β β β β β β β β β β β
β β β Blob1 β β Blob1 β β Blob1 β β Blob1 β β β
β β β Blob2 β β Blob2 β β Blob2 β β Blob2 β β β
β β β Blob3 β β Blob3 β β Blob3 β β Blob3 β β β
β β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β FILE SYSTEMS (Hierarchical Namespace - ADLS Gen2) β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β datalake/ β β β
β β β βββ raw/ β β β
β β β β βββ 2024/01/ β β β
β β β β β βββ sales_data.parquet β β β
β β β β β βββ inventory_data.parquet β β β
β β β β βββ 2024/02/ β β β
β β β βββ curated/ β β β
β β β β βββ dimensions/ β β β
β β β β βββ facts/ β β β
β β β βββ sandbox/ β β β
β β β βββ user_analysis/ β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Access Tiers Comparison
| Tier | Availability | Latency | Min Retention | Cost (per GB/mo) | Use Case |
|---|---|---|---|---|---|
| Hot | 99.9% | Milliseconds | None | $0.018 | Frequently accessed |
| Cool | 99.9% | Milliseconds | 30 days | $0.01 | Infrequent access |
| Cold | 99.9% | Milliseconds | 90 days | $0.0045 | Rare access |
| Archive | 99.9% | Hours | 180 days | $0.001 | Long-term retention |
Tier Transition Costs
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TIER TRANSITION PRICING β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β REHOVING (Moving UP): β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Archive β Cold: $0.0220/GB β β
β β Cold β Cool: $0.0100/GB β β
β β Cool β Hot: $0.0000/GB (free) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β RETRIEVAL (Reading DOWN): β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Hot: $0.0000/GB (free) β β
β β Cool: $0.0100/GB β β
β β Cold: $0.0300/GB β β
β β Archive: Standard: $0.0220/GB (1hr), β β
β β High: $0.0440/GB (1hr), β β
β β Ultra: $0.0460/GB (1hr) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β EARLY DELETION: β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Cool: Prorated for remaining 30 days β β
β β Cold: Prorated for remaining 90 days β β
β β Archive: Prorated for remaining 180 days β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Lifecycle Management Policy
{
"rules": [
{
"enabled": true,
"name": "MoveToCoolAfter30Days",
"type": "Lifecycle",
"definition": {
"actions": {
"baseBlob": {
"tierToCool": {
"daysAfterModificationGreaterThan": 30
}
},
"snapshot": {
"tierToCool": {
"daysAfterCreationGreaterThan": 30
}
},
"filters": {
"blobTypes": ["blockBlob"],
"prefixMatch": ["raw/", "curated/"]
}
}
},
{
"enabled": true,
"name": "MoveToArchiveAfter90Days",
"type": "Lifecycle",
"definition": {
"actions": {
"baseBlob": {
"tierToArchive": {
"daysAfterModificationGreaterThan": 90
},
"delete": {
"daysAfterModificationGreaterThan": 365
}
}
},
"filters": {
"blobTypes": ["blockBlob"],
"prefixMatch": ["raw/"]
}
}
}
]
}
Python SDK for Blob Operations
from azure.storage.filedatalake import DataLakeServiceClient
from azure.identity import DefaultAzureCredential
import datetime
credential = DefaultAzureCredential()
service_client = DataLakeServiceClient(
account_url="https://stdatalake001.dfs.core.windows.net",
credential=credential
)
# Create file system (container)
file_system = service_client.create_file_system(
file_system_name="raw",
public_access=None
)
# Upload file with tier
file_client = service_client.get_file_client(
file_system="raw",
file_name="2024/01/sales_data.parquet"
)
with open("local_sales_data.parquet", "rb") as data:
file_client.upload_data(
data,
overwrite=True,
max_concurrency=4
)
# Set access tier
file_client.set_standard_blob_tier(tier="Cool")
# List files with properties
file_system_client = service_client.get_file_system_client("raw")
paths = list(file_system_client.list_paths(path="2024/01/"))
for path in paths:
props = path
print(f"Name: {path.name}")
print(f"Size: {path.size} bytes")
print(f"Last Modified: {path.last_modified}")
print(f"Content Type: {path.content_type}")
βΉοΈ
Pro Tip: Use AzCopy for large-scale data transfersβit's 10x faster than Azure Portal uploads and supports parallel transfers, tier setting, and lifecycle policy triggers.
AzCopy Commands for Data Engineering
# Copy data from on-prem to ADLS Gen2
azcopy copy "https://onpremserver/data/*.parquet" \
"https://stdatalake001.dfs.core.windows.net/raw/2024/01/?<SAS>" \
--recursive \
--overwrite=true \
--block-size-mb=256
# Set blob tier during copy
azcopy copy "https://source.blob.core.windows.net/container/*" \
"https://stdatalake001.blob.core.windows.net/archive/?<SAS>" \
--blob-type=BlockBlob \
--block-blob-tier=Archive
# Sync directory (incremental)
azcopy sync "https://stdatalake001.dfs.core.windows.net/raw/2024/01/?<SAS>" \
"C:\local-backup" \
--delete-destination=true
# List all blobs with properties
azcopy list "https://stdatalake001.dfs.core.windows.net/raw/?<SAS>" \
--machine-readable \
--overwrite=false \
--output-type=json
Redundancy Options
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β STORAGE REDUNDANCY OPTIONS β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β LRS (Locally Redundant Storage) β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Single Data Center β β
β β ββββββββββββ ββββββββββββ ββββββββββββ β β
β β β Copy 1 β β Copy 2 β β Copy 3 β β β
β β ββββββββββββ ββββββββββββ ββββββββββββ β β
β β Durability: 99.999999999% (11 nines) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β ZRS (Zone-Redundant Storage) β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β 3 Availability Zones β β
β β ββββββββββββ ββββββββββββ ββββββββββββ β β
β β β AZ 1 β β AZ 2 β β AZ 3 β β β
β β β Copy 1 β β Copy 2 β β Copy 3 β β β
β β ββββββββββββ ββββββββββββ ββββββββββββ β β
β β Durability: 99.999999999% (11 nines) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β GRS (Geo-Redundant Storage) β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Primary Region Secondary Region β β
β β ββββββββββββββββββββ ββββββββββββββββββββ β β
β β β ββββ ββββ ββββ ββββββ>β ββββ ββββ ββββ β β β
β β β βC1β βC2β βC3β β β βC1β βC2β βC3β β β β
β β β ββββ ββββ ββββ β Asyncβ ββββ ββββ ββββ β β β
β β ββββββββββββββββββββ ββββββββββββββββββββ β β
β β Durability: 99.999999999% (11 nines) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β RA-GRS (Read-Access Geo-Redundant Storage) β
β Same as GRS but allows READ access to secondary β
β even during primary region outage β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Interview Questions
Q1: When would you use Hot vs Cool vs Archive tiers for data engineering? A: Hot for frequently queried data (last 30 days), Cool for infrequent access (30-90 days), Archive for long-term compliance (90+ days). Example: raw data starts Hot, moves to Cool after 30 days, Archive after 90 days.
Q2: What is the impact of enabling Hierarchical Namespace on a storage account? A: HNS enables ADLS Gen2 capabilities: POSIX ACLs, directory operations, atomic renames, and improved Hadoop compatibility. However, it changes the API endpoint from blob to dfs and some Blob Storage features may not be available.
Q3: How do lifecycle management policies affect data engineering costs? A: Lifecycle policies automatically move data to cheaper tiers, reducing storage costs by 50-90%. However, retrieval costs must be consideredβfrequently accessed archived data can cost more than keeping it in Hot tier.