Kafka Performance Tuning
Producer Tuning batch.size linger.ms buffer.memory compression.type Broker Tuning num.io.threads num.network.threads log.flush.interval socket.send.buffer Consumer Tuning fetch.min.bytes max.poll.records fetch.max.wait max.partition.fetch Network Tuning socket.buffer.size socket.request.max connections.max send.buffer.bytes Performance Optimization Layers
Overview
Kafka performance tuning requires balancing throughput and latency while optimizing CPU , memory , disk I/O , and network resources. This guide covers optimization strategies for producers, brokers, consumers, and networking.
Performance Goals
High Throughput : Maximize messages/bytes per second
Low Latency : Minimize end-to-end delay
Resource Efficiency : Optimize CPU, memory, disk usage
Scalability : Handle growth without degradation
Producer Tuning
Batching Configuration
# Producer batch settings
batch.size=32768 # 32KB batch size
linger.ms=5 # Wait 5ms to fill batch
buffer.memory=67108864 # 64MB buffer memory
max.in.flight.requests.per.connection=5 # Allow 5 in-flight requests
Batching Strategy
from kafka import KafkaProducer
# High throughput configuration
producer = KafkaProducer(
bootstrap_servers=['kafka:9092'],
batch_size=32768, # 32KB batches
linger_ms=5, # Wait for batch fill
buffer_memory=67108864, # 64MB buffer
max_in_flight_requests_per_connection=5,
acks='all',
compression_type='lz4'
)
# Low latency configuration
producer = KafkaProducer(
bootstrap_servers=['kafka:9092'],
batch_size=16384, # 16KB batches
linger_ms=0, # Send immediately
buffer_memory=33554432, # 32MB buffer
max_in_flight_requests_per_connection=1,
acks='all',
compression_type='none'
)
Compression Options
Algorithm CPU Usage Compression Ratio Latency none None 1:1 Lowest gzip High ~3:1 High snappy Low ~2:1 Low lz4 Low ~2.5:1 Low zstd Medium ~3.5:1 Medium
# Compression configuration
producer = KafkaProducer(
bootstrap_servers=['kafka:9092'],
compression_type='lz4', # Best balance of speed/ratio
batch_size=65536, # 64KB for better compression
linger_ms=10 # Allow larger batches
)
Producer Performance Testing
# kafka-producer-perf-test
kafka-producer-perf-test.sh \
--topic test-topic \
--num-records 1000000 \
--record-size 1024 \
--throughput -1 \
--producer-props \
bootstrap.servers=kafka:9092 \
batch.size=32768 \
linger.ms=5 \
compression.type=lz4 \
acks=1
Broker Tuning
Thread Configuration
# Server thread settings
num.io.threads=16 # I/O threads (disk operations)
num.network.threads=8 # Network threads (socket handling)
num.replica.fetchers=4 # Replica fetch threads
background.threads=10 # Background tasks
Log Configuration
# Log flush settings
log.flush.interval.messages=10000 # Flush every 10K messages
log.flush.interval.ms=1000 # Flush every second
log.flush.scheduler.interval.ms=500 # Check every 500ms
# Log segment settings
log.segment.bytes=1073741824 # 1GB segments
log.roll.hours=168 # Roll weekly
log.index.size.max.bytes=10485760 # 10MB index
log.index.interval.bytes=4096 # Index interval
Memory Configuration
# Memory allocation
heap.size=8G # JVM heap size
socket.send.buffer.bytes=1048576 # 1MB send buffer
socket.receive.buffer.bytes=1048576 # 1MB receive buffer
socket.request.max.bytes=104857600 # 100MB max request
Disk I/O Optimization
# Filesystem optimization
mount -o noatime,nodiratime /dev/sda1 /kafka-logs
# I/O scheduler (for SSDs)
echo noop > /sys/block/sda/queue/scheduler
# RAID configuration (RAID 10 for performance)
mdadm --create /dev/md0 --level=10 --raid-devices=4 /dev/sd{a,b,c,d}1
Consumer Tuning
Fetch Configuration
# Consumer fetch settings
fetch.min.bytes=1048576 # 1MB minimum fetch
fetch.max.wait.ms=500 # Wait up to 500ms
max.partition.fetch.bytes=1048576 # 1MB per partition
max.poll.records=500 # Process 500 records per poll
max.poll.interval.ms=300000 # 5 minutes max processing
Consumer Strategy
from kafka import KafkaConsumer
# High throughput consumer
consumer = KafkaConsumer(
'orders',
bootstrap_servers=['kafka:9092'],
group_id='order-processor',
fetch_min_bytes=1048576, # 1MB minimum
fetch_max_wait_ms=500, # Wait for batch
max_partition_fetch_bytes=1048576,
max_poll_records=500,
auto_offset_reset='earliest',
enable_auto_commit=False
)
# Low latency consumer
consumer = KafkaConsumer(
'orders',
bootstrap_servers=['kafka:9092'],
group_id='order-processor',
fetch_min_bytes=1, # Get anything available
fetch_max_wait_ms=100, # Quick response
max_partition_fetch_bytes=65536,
max_poll_records=50,
auto_offset_reset='earliest',
enable_auto_commit=False
)
Consumer Performance Testing
# kafka-consumer-perf-test
kafka-consumer-perf-test.sh \
--topic test-topic \
--messages 1000000 \
--group perf-test-group \
--bootstrap-server kafka:9092 \
--fetch-size 1048576
Network Tuning
Socket Configuration
# Producer network
socket.send.buffer.bytes=1048576 # 1MB send buffer
socket.request.max.bytes=104857600 # 100MB max request
# Consumer network
socket.receive.buffer.bytes=1048576 # 1MB receive buffer
fetch.min.bytes=1048576 # 1MB minimum fetch
# Broker network
num.network.threads=8
socket.receive.buffer.bytes=1048576
socket.send.buffer.bytes=1048576
OS Tuning
# Increase file descriptor limits
ulimit -n 100000
# Kernel network tuning
sysctl -w net.core.rmem_max=16777216
sysctl -w net.core.wmem_max=16777216
sysctl -w net.core.rmem_default=16777216
sysctl -w net.core.wmem_default=16777216
sysctl -w net.ipv4.tcp_rmem="4096 87380 16777216"
sysctl -w net.ipv4.tcp_wmem="4096 65536 16777216"
sysctl -w net.core.netdev_max_backlog=5000
Performance Monitoring
Key Metrics
# Performance metrics to track
metrics:
- name: kafka_server_BrokerTopicMetrics_MessagesInPerSec
description: "Messages received per second"
target: "> 100000"
- name: kafka_server_BrokerTopicMetrics_BytesInPerSec
description: "Bytes received per second"
target: "> 100MB/s"
- name: kafka_server_ReplicaManager_UnderReplicatedPartitions
description: "Under-replicated partitions"
target: "= 0"
- name: kafka_consumer_group_lag
description: "Consumer group lag"
target: "< 1000"
Performance Dashboard
{
"panels": [
{
"title": "Throughput (MB/s)",
"type": "timeseries",
"targets": [
{
"expr": "sum(rate(kafka_server_BrokerTopicMetrics_BytesInPerSec[5m]))",
"legendFormat": "Inbound"
},
{
"expr": "sum(rate(kafka_server_BrokerTopicMetrics_BytesOutPerSec[5m]))",
"legendFormat": "Outbound"
}
]
},
{
"title": "Latency (p99)",
"type": "timeseries",
"targets": [
{
"expr": "histogram_quantile(0.99, kafka_network_RequestMetrics_TotalTimeMs_bucket)",
"legendFormat": "p99 latency"
}
]
}
]
}
Benchmarking
End-to-End Benchmark
import time
from kafka import KafkaProducer, KafkaConsumer
import statistics
def benchmark_producer(num_messages, message_size, batch_size, linger_ms):
producer = KafkaProducer(
bootstrap_servers=['kafka:9092'],
batch_size=batch_size,
linger_ms=linger_ms,
compression_type='lz4',
acks='all'
)
start_time = time.time()
for i in range(num_messages):
message = b'x' * message_size
producer.send('benchmark-topic', value=message)
producer.flush()
end_time = time.time()
elapsed = end_time - start_time
throughput = num_messages / elapsed
mb_throughput = (num_messages * message_size) / elapsed / 1024 / 1024
print(f"Throughput: {throughput:.0f} messages/sec")
print(f"Bandwidth: {mb_throughput:.1f} MB/sec")
print(f"Latency avg: {elapsed/num_messages*1000:.2f} ms")
# Run benchmark
benchmark_producer(
num_messages=100000,
message_size=1024,
batch_size=32768,
linger_ms=5
)
Common Issues and Solutions
High Latency
Symptom Cause Solution High end-to-end latency Large batches Reduce batch.size and linger.ms Slow consumer processing Small fetches Increase fetch.min.bytes Network congestion Insufficient buffers Increase socket buffers
Low Throughput
Symptom Cause Solution Low messages/sec No compression Enable compression.type=lz4 High CPU usage Compression overhead Use snappy or lz4 Disk I/O bottleneck Too many small writes Increase batch.size
Summary
Kafka performance tuning requires optimizing producer batching , broker threading , consumer fetching , and network buffers . Monitor key metrics and adjust configurations based on your specific throughput and latency requirements.