πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Cosmos DB Deep Dive: Partitions, Indexing & Change Feed

Azure Data EngineeringCosmos DB Deep Dive⭐ Premium

Advertisement

Cosmos DB Deep Dive: Partitions, Indexing & Change Feed

Master Cosmos DB internals with partitioning, indexing, change feed, and multi-region configuration

Partitioning Deep Dive

Architecture Diagram
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    COSMOS DB PARTITIONING                            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                     β”‚
β”‚  PARTITION KEY SELECTION                                            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ Good Partition Keys:                                         β”‚   β”‚
β”‚  β”‚ β€’ High cardinality (many distinct values)                    β”‚   β”‚
β”‚  β”‚ β€’ Even distribution (no hot partitions)                      β”‚   β”‚
β”‚  β”‚ β€’ Frequently queried in WHERE clause                         β”‚   β”‚
β”‚  β”‚                                                              β”‚   β”‚
β”‚  β”‚ Examples:                                                    β”‚   β”‚
β”‚  β”‚ β€’ IoT: /deviceId                                             β”‚   β”‚
β”‚  β”‚ β€’ E-commerce: /orderId                                       β”‚   β”‚
β”‚  β”‚ β€’ Social: /userId                                            β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                     β”‚
β”‚  PHYSICAL PARTITION SIZING                                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ β€’ Logical partition max: 10 GB                               β”‚   β”‚
β”‚  β”‚ β€’ Physical partition: Up to 10 logical partitions            β”‚   β”‚
β”‚  β”‚ β€’ Auto-split when approaching 10 GB                          β”‚   β”‚
β”‚  β”‚ β€’ RU/s distributed across physical partitions                β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                     β”‚
β”‚  MULTI-PARTITION QUERIES                                            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ β€’ Cross-partition queries: Higher latency, more RU           β”‚   β”‚
β”‚  β”‚ β€’ Partition-scoped queries: Lower latency, fewer RU          β”‚   β”‚
β”‚  β”‚ β€’ Always include partition key in queries when possible      β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Indexing Policy

{
  "indexingPolicy": {
    "automatic": true,
    "indexingMode": "Consistent",
    "includedPaths": [
      {
        "path": "/*",
        "indexes": [
          {
            "kind": "Range",
            "dataType": "Number",
            "precision": -1
          },
          {
            "kind": "Range",
            "dataType": "String",
            "precision": -1
          }
        ]
      }
    ],
    "excludedPaths": [
      {
        "path": "/_etag"
      },
      {
        "path": "/metadata/*"
      }
    ],
    "compositeIndexes": [
      [
        { "path": "/sale_date", "order": "ascending" },
        { "path": "/customer_id", "order": "ascending" }
      ]
    ]
  }
}

Change Feed Implementation

# Change feed processor
from azure.cosmos import CosmosClient
import time

client = CosmosClient(
    "https://cosmos-prod.documents.azure.com:443/",
    DefaultAzureCredential()
)

database = client.get_database_client("analytics")
container = database.get_container_client("events")

# Process changes
continuation_token = None
while True:
    response = container.query_items_change_feed(
        start_time=datetime.utcnow() - timedelta(minutes=5),
        continuation_token=continuation_token
    )
    
    for event in response:
        # Process change
        process_change(event)
    
    if response.continuation_token:
        continuation_token = response.continuation_token
    
    time.sleep(5)

Multi-Region Configuration

{
  "properties": {
    "locations": [
      { "locationName": "East US 2", "failoverPriority": 0, "isZoneRedundant": true },
      { "locationName": "West Europe", "failoverPriority": 1, "isZoneRedundant": true },
      { "locationName": "Southeast Asia", "failoverPriority": 2, "isZoneRedundant": true }
    ],
    "databaseAccountOfferType": "Standard",
    "enableMultipleWriteLocations": true,
    "consistencyPolicy": {
      "defaultConsistencyLevel": "Session"
    }
  }
}

ℹ️

Pro Tip: Use composite indexes for queries that filter on multiple properties. This avoids cross-partition scans and reduces RU consumption.

Interview Questions

Q1: How do you choose a partition key for Cosmos DB? A: Choose a key with high cardinality, even distribution, and frequent query usage. Avoid timestamps or sequential IDs (create hot partitions). Test with representative data volumes.

Q2: What is the cost impact of cross-partition queries? A: Cross-partition queries scan all physical partitions, consuming more RU/s and having higher latency. Always include partition key in queries when possible to minimize costs.

Q3: How do you handle hot partitions in Cosmos DB? A: 1) Redistribute data (if caused by key pattern), 2) Use hierarchical partition keys (preview), 3) Scale up RU/s, 4) Implement backoff/retry logic, 5) Consider repartitioning with a different key.

Advertisement