πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Event Hubs Deep Dive: Capture, Partitioning & Ordering

Azure Data EngineeringEvent Hubs Deep Dive⭐ Premium

Advertisement

Event Hubs Deep Dive: Capture, Partitioning & Ordering

Master Event Hubs with advanced partitioning, Capture, and throughput optimization

Event Hubs Internals

Architecture Diagram
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    EVENT HUBS INTERNALS                              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                     β”‚
β”‚  NAMESPACE LEVEL                                                    β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ β€’ Throughput Units (TU): Shared across all Event Hubs       β”‚   β”‚
β”‚  β”‚ β€’ Premium: Capacity Units (CU) per Event Hub                β”‚   β”‚
β”‚  β”‚ β€’ Max 40 TU (Standard), 100 CU (Premium)                    β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                     β”‚
β”‚  PARTITION LEVEL                                                    β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ β€’ Each partition: ordered append-only log                    β”‚   β”‚
β”‚  β”‚ β€’ Offset: Position within partition                          β”‚   β”‚
β”‚  β”‚ β€’ Sequence Number: Monotonically increasing per partition    β”‚   β”‚
β”‚  β”‚ β€’ Max 32 partitions (Standard), 128 (Premium)                β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                     β”‚
β”‚  CONSUMER GROUP LEVEL                                               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ β€’ Independent offset tracking per consumer group             β”‚   β”‚
β”‚  β”‚ β€’ Max 1000 consumer groups per Event Hub                    β”‚   β”‚
β”‚  β”‚ β€’ $Default: Shared by all consumers                         β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                     β”‚
β”‚  CAPTURE LEVEL                                                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ β€’ Auto-archive to ADLS Gen2 / Blob Storage                  β”‚   β”‚
β”‚  β”‚ β€’ Time window: 1 min - 15 min                               β”‚   β”‚
β”‚  β”‚ β€’ Size window: 1 MB - 1 GB                                  β”‚   β”‚
β”‚  β”‚ β€’ Format: Avro or Parquet                                   β”‚   β”‚
β”‚  β”‚ β€’ Ordering: Capture files in offset order per partition      β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Capture Configuration

{
  "properties": {
    "captureDescription": {
      "enabled": true,
      "encoding": "Parquet",
      "destination": {
        "properties": {
          "storageAccountResourceId": "/subscriptions/xxx/resourceGroups/rg/providers/Microsoft.Storage/storageAccounts/stdatalake001",
          "blobContainer": "event-hubs-capture",
          "archiveNameFormat": "{Namespace}/{EventHub}/{PartitionId}/{Year}/{Month}/{Day}/{Hour}/{Minute}/{Second}",
          "timeWindow": "00:05:00",
          "sizeLimitInBytes": 104857600
        }
      },
      "skipEmptyArchive": true
    }
  }
}

Partitioning Strategy

# Key-based partitioning for ordering
from azure.eventhub import EventHubProducerClient, EventData

producer = EventHubProducerClient.from_connection_string(conn_str)

# Use device_id as partition key for ordering per device
event = EventData(json.dumps({"device_id": "sensor-001", "temp": 72.5}))
options = {"partition_key": "sensor-001"}
producer.send_batch([event], **options)

# Round-robin for even distribution (no ordering guarantee)
producer.send_batch([event])  # No partition key

Throughput Monitoring

# Monitor Event Hub metrics
from azure.monitor import MonitorManagementClient

monitor_client = MonitorManagementClient(credential, subscription_id)

metrics = monitor_client.metrics.list(
    resource_uri=f"/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.EventHub/namespaces/ns/eventhubs/hub",
    metricnames="IncomingBytes,OutgoingBytes,IncomingMessages",
    interval="PT1M"
)

for metric in metrics.value:
    print(f"{metric.name}: {metric.timeseries[0].data[-1].average}")

ℹ️

Pro Tip: Monitor TU/CU utilization. If consistently above 80%, scale up. Use partition keys that evenly distribute events to avoid hot partitions.

Interview Questions

Q1: How do you scale Event Hubs for higher throughput? A: 1) Increase TU/CU, 2) Add partitions, 3) Use partition keys evenly, 4) Scale consumers, 5) Use Premium tier for dedicated resources. Monitor utilization to determine which approach is needed.

Q2: What is the difference between consumer group and partition? A: Partition is a physical ordering unit within an Event Hub. Consumer group is a logical offset tracking unit. Multiple consumer groups can read the same partition independently.

Q3: How do you replay events from a specific point in time? A: Reset consumer group offset to the desired point. Use sequence number or timestamp. For Capture files, read from the appropriate file in ADLS.

Advertisement