Cosmos DB Deep Dive: Partitions, Indexing & Change Feed
Master Cosmos DB internals with partitioning, indexing, change feed, and multi-region configuration
Partitioning Deep Dive
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β COSMOS DB PARTITIONING β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β PARTITION KEY SELECTION β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Good Partition Keys: β β
β β β’ High cardinality (many distinct values) β β
β β β’ Even distribution (no hot partitions) β β
β β β’ Frequently queried in WHERE clause β β
β β β β
β β Examples: β β
β β β’ IoT: /deviceId β β
β β β’ E-commerce: /orderId β β
β β β’ Social: /userId β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β PHYSICAL PARTITION SIZING β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β’ Logical partition max: 10 GB β β
β β β’ Physical partition: Up to 10 logical partitions β β
β β β’ Auto-split when approaching 10 GB β β
β β β’ RU/s distributed across physical partitions β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β MULTI-PARTITION QUERIES β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β’ Cross-partition queries: Higher latency, more RU β β
β β β’ Partition-scoped queries: Lower latency, fewer RU β β
β β β’ Always include partition key in queries when possible β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Indexing Policy
{
"indexingPolicy": {
"automatic": true,
"indexingMode": "Consistent",
"includedPaths": [
{
"path": "/*",
"indexes": [
{
"kind": "Range",
"dataType": "Number",
"precision": -1
},
{
"kind": "Range",
"dataType": "String",
"precision": -1
}
]
}
],
"excludedPaths": [
{
"path": "/_etag"
},
{
"path": "/metadata/*"
}
],
"compositeIndexes": [
[
{ "path": "/sale_date", "order": "ascending" },
{ "path": "/customer_id", "order": "ascending" }
]
]
}
}
Change Feed Implementation
# Change feed processor
from azure.cosmos import CosmosClient
import time
client = CosmosClient(
"https://cosmos-prod.documents.azure.com:443/",
DefaultAzureCredential()
)
database = client.get_database_client("analytics")
container = database.get_container_client("events")
# Process changes
continuation_token = None
while True:
response = container.query_items_change_feed(
start_time=datetime.utcnow() - timedelta(minutes=5),
continuation_token=continuation_token
)
for event in response:
# Process change
process_change(event)
if response.continuation_token:
continuation_token = response.continuation_token
time.sleep(5)
Multi-Region Configuration
{
"properties": {
"locations": [
{ "locationName": "East US 2", "failoverPriority": 0, "isZoneRedundant": true },
{ "locationName": "West Europe", "failoverPriority": 1, "isZoneRedundant": true },
{ "locationName": "Southeast Asia", "failoverPriority": 2, "isZoneRedundant": true }
],
"databaseAccountOfferType": "Standard",
"enableMultipleWriteLocations": true,
"consistencyPolicy": {
"defaultConsistencyLevel": "Session"
}
}
}
βΉοΈ
Pro Tip: Use composite indexes for queries that filter on multiple properties. This avoids cross-partition scans and reduces RU consumption.
Interview Questions
Q1: How do you choose a partition key for Cosmos DB? A: Choose a key with high cardinality, even distribution, and frequent query usage. Avoid timestamps or sequential IDs (create hot partitions). Test with representative data volumes.
Q2: What is the cost impact of cross-partition queries? A: Cross-partition queries scan all physical partitions, consuming more RU/s and having higher latency. Always include partition key in queries when possible to minimize costs.
Q3: How do you handle hot partitions in Cosmos DB? A: 1) Redistribute data (if caused by key pattern), 2) Use hierarchical partition keys (preview), 3) Scale up RU/s, 4) Implement backoff/retry logic, 5) Consider repartitioning with a different key.