Kafka Performance Tuning

Overview

Kafka performance tuning requires balancing throughput and latency while optimizing CPU, memory, disk I/O, and network resources. This guide covers optimization strategies for producers, brokers, consumers, and networking.

Performance Goals

High Throughput: Maximize messages/bytes per second
Low Latency: Minimize end-to-end delay
Resource Efficiency: Optimize CPU, memory, disk usage
Scalability: Handle growth without degradation

Producer Tuning

Batching Configuration

# Producer batch settings
batch.size=32768          # 32KB batch size
linger.ms=5              # Wait 5ms to fill batch
buffer.memory=67108864   # 64MB buffer memory
max.in.flight.requests.per.connection=5  # Allow 5 in-flight requests

Batching Strategy

from kafka import KafkaProducer

# High throughput configuration
producer = KafkaProducer(
    bootstrap_servers=['kafka:9092'],
    batch_size=32768,          # 32KB batches
    linger_ms=5,              # Wait for batch fill
    buffer_memory=67108864,   # 64MB buffer
    max_in_flight_requests_per_connection=5,
    acks='all',
    compression_type='lz4'
)

# Low latency configuration
producer = KafkaProducer(
    bootstrap_servers=['kafka:9092'],
    batch_size=16384,          # 16KB batches
    linger_ms=0,              # Send immediately
    buffer_memory=33554432,   # 32MB buffer
    max_in_flight_requests_per_connection=1,
    acks='all',
    compression_type='none'
)

Compression Options

Algorithm	CPU Usage	Compression Ratio	Latency
none	None	1:1	Lowest
gzip	High	~3:1	High
snappy	Low	~2:1	Low
lz4	Low	~2.5:1	Low
zstd	Medium	~3.5:1	Medium

# Compression configuration
producer = KafkaProducer(
    bootstrap_servers=['kafka:9092'],
    compression_type='lz4',  # Best balance of speed/ratio
    batch_size=65536,        # 64KB for better compression
    linger_ms=10            # Allow larger batches
)

Producer Performance Testing

# kafka-producer-perf-test
kafka-producer-perf-test.sh \
  --topic test-topic \
  --num-records 1000000 \
  --record-size 1024 \
  --throughput -1 \
  --producer-props \
    bootstrap.servers=kafka:9092 \
    batch.size=32768 \
    linger.ms=5 \
    compression.type=lz4 \
    acks=1

Broker Tuning

Thread Configuration

# Server thread settings
num.io.threads=16           # I/O threads (disk operations)
num.network.threads=8       # Network threads (socket handling)
num.replica.fetchers=4      # Replica fetch threads
background.threads=10       # Background tasks

Log Configuration

# Log flush settings
log.flush.interval.messages=10000  # Flush every 10K messages
log.flush.interval.ms=1000         # Flush every second
log.flush.scheduler.interval.ms=500  # Check every 500ms

# Log segment settings
log.segment.bytes=1073741824   # 1GB segments
log.roll.hours=168             # Roll weekly
log.index.size.max.bytes=10485760  # 10MB index
log.index.interval.bytes=4096  # Index interval

Memory Configuration

# Memory allocation
heap.size=8G               # JVM heap size
socket.send.buffer.bytes=1048576   # 1MB send buffer
socket.receive.buffer.bytes=1048576  # 1MB receive buffer
socket.request.max.bytes=104857600   # 100MB max request

Disk I/O Optimization

# Filesystem optimization
mount -o noatime,nodiratime /dev/sda1 /kafka-logs

# I/O scheduler (for SSDs)
echo noop > /sys/block/sda/queue/scheduler

# RAID configuration (RAID 10 for performance)
mdadm --create /dev/md0 --level=10 --raid-devices=4 /dev/sd{a,b,c,d}1

Consumer Tuning

Fetch Configuration

# Consumer fetch settings
fetch.min.bytes=1048576       # 1MB minimum fetch
fetch.max.wait.ms=500         # Wait up to 500ms
max.partition.fetch.bytes=1048576  # 1MB per partition
max.poll.records=500          # Process 500 records per poll
max.poll.interval.ms=300000   # 5 minutes max processing

Consumer Strategy

from kafka import KafkaConsumer

# High throughput consumer
consumer = KafkaConsumer(
    'orders',
    bootstrap_servers=['kafka:9092'],
    group_id='order-processor',
    fetch_min_bytes=1048576,        # 1MB minimum
    fetch_max_wait_ms=500,          # Wait for batch
    max_partition_fetch_bytes=1048576,
    max_poll_records=500,
    auto_offset_reset='earliest',
    enable_auto_commit=False
)

# Low latency consumer
consumer = KafkaConsumer(
    'orders',
    bootstrap_servers=['kafka:9092'],
    group_id='order-processor',
    fetch_min_bytes=1,              # Get anything available
    fetch_max_wait_ms=100,          # Quick response
    max_partition_fetch_bytes=65536,
    max_poll_records=50,
    auto_offset_reset='earliest',
    enable_auto_commit=False
)

Consumer Performance Testing

# kafka-consumer-perf-test
kafka-consumer-perf-test.sh \
  --topic test-topic \
  --messages 1000000 \
  --group perf-test-group \
  --bootstrap-server kafka:9092 \
  --fetch-size 1048576

Network Tuning

Socket Configuration

# Producer network
socket.send.buffer.bytes=1048576    # 1MB send buffer
socket.request.max.bytes=104857600  # 100MB max request

# Consumer network
socket.receive.buffer.bytes=1048576  # 1MB receive buffer
fetch.min.bytes=1048576              # 1MB minimum fetch

# Broker network
num.network.threads=8
socket.receive.buffer.bytes=1048576
socket.send.buffer.bytes=1048576

OS Tuning

# Increase file descriptor limits
ulimit -n 100000

# Kernel network tuning
sysctl -w net.core.rmem_max=16777216
sysctl -w net.core.wmem_max=16777216
sysctl -w net.core.rmem_default=16777216
sysctl -w net.core.wmem_default=16777216
sysctl -w net.ipv4.tcp_rmem="4096 87380 16777216"
sysctl -w net.ipv4.tcp_wmem="4096 65536 16777216"
sysctl -w net.core.netdev_max_backlog=5000

Performance Monitoring

Key Metrics

# Performance metrics to track
metrics:
  - name: kafka_server_BrokerTopicMetrics_MessagesInPerSec
    description: "Messages received per second"
    target: "> 100000"
    
  - name: kafka_server_BrokerTopicMetrics_BytesInPerSec
    description: "Bytes received per second"
    target: "> 100MB/s"
    
  - name: kafka_server_ReplicaManager_UnderReplicatedPartitions
    description: "Under-replicated partitions"
    target: "= 0"
    
  - name: kafka_consumer_group_lag
    description: "Consumer group lag"
    target: "< 1000"

Performance Dashboard

{
  "panels": [
    {
      "title": "Throughput (MB/s)",
      "type": "timeseries",
      "targets": [
        {
          "expr": "sum(rate(kafka_server_BrokerTopicMetrics_BytesInPerSec[5m]))",
          "legendFormat": "Inbound"
        },
        {
          "expr": "sum(rate(kafka_server_BrokerTopicMetrics_BytesOutPerSec[5m]))",
          "legendFormat": "Outbound"
        }
      ]
    },
    {
      "title": "Latency (p99)",
      "type": "timeseries",
      "targets": [
        {
          "expr": "histogram_quantile(0.99, kafka_network_RequestMetrics_TotalTimeMs_bucket)",
          "legendFormat": "p99 latency"
        }
      ]
    }
  ]
}

Benchmarking

End-to-End Benchmark

import time
from kafka import KafkaProducer, KafkaConsumer
import statistics

def benchmark_producer(num_messages, message_size, batch_size, linger_ms):
    producer = KafkaProducer(
        bootstrap_servers=['kafka:9092'],
        batch_size=batch_size,
        linger_ms=linger_ms,
        compression_type='lz4',
        acks='all'
    )
    
    start_time = time.time()
    for i in range(num_messages):
        message = b'x' * message_size
        producer.send('benchmark-topic', value=message)
    producer.flush()
    end_time = time.time()
    
    elapsed = end_time - start_time
    throughput = num_messages / elapsed
    mb_throughput = (num_messages * message_size) / elapsed / 1024 / 1024
    
    print(f"Throughput: {throughput:.0f} messages/sec")
    print(f"Bandwidth: {mb_throughput:.1f} MB/sec")
    print(f"Latency avg: {elapsed/num_messages*1000:.2f} ms")

# Run benchmark
benchmark_producer(
    num_messages=100000,
    message_size=1024,
    batch_size=32768,
    linger_ms=5
)

Common Issues and Solutions

High Latency

Symptom	Cause	Solution
High end-to-end latency	Large batches	Reduce `batch.size` and `linger.ms`
Slow consumer processing	Small fetches	Increase `fetch.min.bytes`
Network congestion	Insufficient buffers	Increase socket buffers

Low Throughput

Symptom	Cause	Solution
Low messages/sec	No compression	Enable `compression.type=lz4`
High CPU usage	Compression overhead	Use `snappy` or `lz4`
Disk I/O bottleneck	Too many small writes	Increase `batch.size`

Summary

Kafka performance tuning requires optimizing producer batching, broker threading, consumer fetching, and network buffers. Monitor key metrics and adjust configurations based on your specific throughput and latency requirements.

Kafka Performance Tuning

Kafka Performance Tuning

Overview

Performance Goals

Producer Tuning

Batching Configuration

Batching Strategy

Compression Options

Producer Performance Testing

Broker Tuning

Thread Configuration

Log Configuration

Memory Configuration

Disk I/O Optimization

Consumer Tuning

Fetch Configuration

Consumer Strategy

Consumer Performance Testing

Network Tuning

Socket Configuration

OS Tuning

Performance Monitoring

Key Metrics

Performance Dashboard

Benchmarking

End-to-End Benchmark

Common Issues and Solutions

High Latency

Low Throughput

Summary

Premium Content

Need Expert Kafka Help?