πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Kafka on Kubernetes

🟒 Free Lesson

Advertisement

Kafka on Kubernetes

Strimzi OperatorCluster OperatorTopic OperatorUser OperatorReconciliationKafka Podskafka-broker-0kafka-broker-1kafka-broker-2ZooKeeper EnsembleStoragePersistentVolumesStorageClassesStatefulSetsVolumeClaimsServicesClusterIPNodePortLoadBalancerHeadlessMetricsPrometheusGrafanaAlertsDashboards

Overview

Running Kafka on Kubernetes provides automated operations, scalability, and resource efficiency. This guide covers deploying Kafka with Strimzi operator, managing persistent storage, and implementing autoscaling.

Benefits of Kubernetes

  • Automated Deployment: Declarative configuration
  • Self-Healing: Automatic pod restarts
  • Scaling: Horizontal and vertical scaling
  • Resource Efficiency: Better cluster utilization
  • Rolling Updates: Zero-downtime upgrades

Strimzi Operator Setup

Install Strimzi

# Install Strimzi using Helm
helm repo add strimzi https://strimzi.io/charts/
helm repo update

helm install strimzi-kafka-operator strimzi/strimzi-kafka-operator \
  --namespace kafka \
  --create-namespace \
  --set watchNamespaces=all

Kafka Cluster CRD

# kafka-cluster.yaml
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: my-kafka-cluster
  namespace: kafka
spec:
  kafka:
    version: 3.5.1
    replicas: 3
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true
    config:
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
      default.replication.factor: 3
      min.insync.replicas: 2
      inter.broker.protocol.version: "3.5"
    storage:
      type: persistent-claim
      size: 100Gi
      class: fast-ssd
      deleteClaim: false
    resources:
      requests:
        memory: 4Gi
        cpu: 2
      limits:
        memory: 8Gi
        cpu: 4
  zookeeper:
    replicas: 3
    storage:
      type: persistent-claim
      size: 20Gi
      class: fast-ssd
      deleteClaim: false
  entityOperator:
    topicOperator: {}
    userOperator: {}

Apply Configuration

# Apply Kafka cluster
kubectl apply -f kafka-cluster.yaml

# Check cluster status
kubectl get kafka -n kafka

# Check pods
kubectl get pods -n kafka -l app.kubernetes.io/name=kafka

Persistent Volumes

StorageClass Configuration

# storageclass.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp3
  iopsPerGB: "10"
  encrypted: "true"
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer

PersistentVolumeClaim

# pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: kafka-data-my-kafka-cluster-kafka-0
  namespace: kafka
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: fast-ssd
  resources:
    requests:
      storage: 100Gi

StatefulSet Configuration

# StatefulSet is managed by Strimzi, but here's the concept
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: my-kafka-cluster-kafka
  namespace: kafka
spec:
  serviceName: my-kafka-cluster-kafka
  replicas: 3
  selector:
    matchLabels:
      app.kubernetes.io/name: kafka
  template:
    metadata:
      labels:
        app.kubernetes.io/name: kafka
    spec:
      containers:
        - name: kafka
          image: quay.io/strimzi/kafka:3.5.1
          ports:
            - containerPort: 9092
              name: plain
            - containerPort: 9093
              name: tls
          env:
            - name: KAFKA_CFG_NODE_ID
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
          volumeMounts:
            - name: data
              mountPath: /var/lib/kafka/data
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes:
          - ReadWriteOnce
        storageClassName: fast-ssd
        resources:
          requests:
            storage: 100Gi

Topic Management

Topic CRD

# orders-topic.yaml
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
  name: orders
  namespace: kafka
  labels:
    strimzi.io/cluster: my-kafka-cluster
spec:
  partitions: 6
  replicas: 3
  config:
    retention.ms: 604800000  # 7 days
    cleanup.policy: delete
    compression.type: lz4
    min.insync.replicas: 2

Topic Operations

# Create topic
kubectl apply -f orders-topic.yaml

# List topics
kubectl get kafkatopics -n kafka

# Update topic
kubectl patch kafkatopic orders -n kafka --type merge -p '{"spec":{"partitions":12}}'

# Delete topic
kubectl delete kafkatopic orders -n kafka

User Management

User CRD

# app-user.yaml
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaUser
metadata:
  name: app-user
  namespace: kafka
  labels:
    strimzi.io/cluster: my-kafka-cluster
spec:
  authentication:
    type: tls
  authorization:
    type: simple
    acls:
      - resource:
          type: topic
          name: orders
        operations:
          - Read
          - Describe
        host: "*"
      - resource:
          type: topic
          name: orders
        operations:
          - Write
        host: "*"
      - resource:
          type: group
          name: order-processor
        operations:
          - Read
        host: "*"

Horizontal Pod Autoscaling

HPA Configuration

# kafka-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: kafka-broker-hpa
  namespace: kafka
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: StatefulSet
    name: my-kafka-cluster-kafka
  minReplicas: 3
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
    - type: Pods
      pods:
        metric:
          name: kafka_server_BrokerTopicMetrics_MessagesInPerSec
        target:
          type: AverageValue
          averageValue: "100000"

Custom Metrics

# custom-metrics.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: kafka-consumer-hpa
  namespace: kafka
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: order-consumer
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: External
      external:
        metric:
          name: kafka_consumer_group_lag
          selector:
            matchLabels:
              group: order-processor
        target:
          type: AverageValue
          averageValue: "1000"

Rolling Updates

Zero-Downtime Updates

# Kafka cluster with rolling update strategy
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: my-kafka-cluster
spec:
  kafka:
    replicas: 3
    rack:
      topologyKey: topology.kubernetes.io/zone
    config:
      default.replication.factor: 3
      min.insync.replicas: 2
    storage:
      type: persistent-claim
      size: 100Gi
  cruiseControl: {}

Update Strategies

# Trigger rolling update
kubectl annotate kafka my-kafka-cluster -n kafka \
  strimzi.io/manual-roll-update=true

# Monitor rolling update
kubectl get pods -n kafka -l app.kubernetes.io/name=kafka -w

# Check rollout status
kubectl rollout status statefulset/my-kafka-cluster-kafka -n kafka

Monitoring with Prometheus

ServiceMonitor Configuration

# servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: kafka-metrics
  namespace: kafka
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      strimzi.io/cluster: my-kafka-cluster
  namespaceSelector:
    matchNames:
      - kafka
  endpoints:
    - port: metrics
      interval: 15s
      path: /metrics

PrometheusRule

# prometheusrule.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: kafka-alerts
  namespace: kafka
spec:
  groups:
    - name: kafka
      rules:
        - alert: KafkaUnderReplicatedPartitions
          expr: kafka_server_replicamanager_underreplicatedpartitions > 0
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "Under-replicated partitions detected"
            description: "{{ $value }} partitions are under-replicated"
        
        - alert: KafkaConsumerLagHigh
          expr: kafka_consumer_group_lag > 10000
          for: 10m
          labels:
            severity: warning
          annotations:
            summary: "Consumer lag is high"
            description: "Consumer group {{ $labels.group }} has lag of {{ $value }}"

Grafana Dashboard

{
  "dashboard": {
    "title": "Kafka on Kubernetes",
    "panels": [
      {
        "title": "Kafka Brokers",
        "type": "stat",
        "targets": [
          {
            "expr": "count(kafka_server_BrokerTopicMetrics_MessagesInPerSec)",
            "legendFormat": "Brokers"
          }
        ]
      },
      {
        "title": "Messages In Rate",
        "type": "timeseries",
        "targets": [
          {
            "expr": "sum(rate(kafka_server_BrokerTopicMetrics_MessagesInPerSec[5m]))",
            "legendFormat": "Messages/sec"
          }
        ]
      }
    ]
  }
}

Best Practices

Resource Management

# Resource recommendations
resources:
  kafka:
    requests:
      memory: 4Gi
      cpu: 2
    limits:
      memory: 8Gi
      cpu: 4
  zookeeper:
    requests:
      memory: 2Gi
      cpu: 1
    limits:
      memory: 4Gi
      cpu: 2

Backup Strategy

#!/bin/bash
# backup_kafka.sh

# Backup Kafka data
kubectl exec -n kafka my-kafka-cluster-kafka-0 -- \
  kafka-metadata.sh snapshot /var/lib/kafka/data

# Backup PersistentVolume
kubectl get pv -n kafka -l app.kubernetes.io/name=kafka

# Export topic configurations
kubectl get kafkatopics -n kafka -o yaml > kafka-topics-backup.yaml

Disaster Recovery

# MirrorMaker 2 for cross-cluster replication
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaMirrorMaker2
metadata:
  name: my-mirror-maker-2
  namespace: kafka
spec:
  version: 3.5.1
  replicas: 2
  connectCluster: "target-cluster"
  clusters:
    - alias: "source-cluster"
      bootstrapServers: kafka-source:9092
    - alias: "target-cluster"
      bootstrapServers: kafka-target:9092
  mirrors:
    - sourceCluster: "source-cluster"
      targetCluster: "target-cluster"
      topicsPattern: "orders|payments|users"
      topicsPatternExclude: ".*internal"

Summary

Running Kafka on Kubernetes with Strimzi provides automated operations, scalable deployment, and reliable storage. Implement HPA, monitoring, and backup strategies for production-ready deployments.

⭐

Premium Content

Kafka on Kubernetes

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
πŸ’ΌInterview Prep
πŸ“œCertificates
🀝Community Access

Already a member? Log in

Need Expert Kafka Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement