Skip to main content
APA
Sponsored by CAST AI — Kubernetes cost optimization Better Stack — Uptime monitoring and log management
⚠️

Alert thresholds depend on the nature of your applications. Some queries may have arbitrary tolerance thresholds. Building an efficient monitoring platform takes time. 😉

Kafka Prometheus Alert Rules

4 Prometheus alerting rules for Kafka. Exported via danielqsj/kafka_exporter, linkedin/Burrow. These rules cover critical and warning conditions — copy and paste the YAML into your Prometheus configuration.

wget https://raw.githubusercontent.com/samber/awesome-prometheus-alerts/refs/heads/master/dist/rules/kafka/danielqsj-kafka-exporter.yml
critical

3.3.1.1. Kafka topics replicas

Kafka topic {{ $labels.topic }} has fewer than 3 in-sync replicas ({{ $value }}), data durability is at risk.

- alert: KafkaTopicsReplicas
  expr: min(kafka_topic_partition_in_sync_replica) by (topic) < 3
  for: 0m
  labels:
    severity: critical
  annotations:
    summary: Kafka topics replicas (instance {{ $labels.instance }})
    description: "Kafka topic {{ $labels.topic }} has fewer than 3 in-sync replicas ({{ $value }}), data durability is at risk.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

3.3.1.2. Kafka consumer group lag

Kafka consumer group {{ $labels.consumergroup }} is lagging behind ({{ $value }} messages)

- alert: KafkaConsumerGroupLag
  expr: sum(kafka_consumergroup_lag) by (consumergroup) > 10000
  for: 1m
  labels:
    severity: warning
  annotations:
    summary: Kafka consumer group lag (instance {{ $labels.instance }})
    description: "Kafka consumer group {{ $labels.consumergroup }} is lagging behind ({{ $value }} messages)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

3.3.2. linkedin/Burrow (2 rules)

wget https://raw.githubusercontent.com/samber/awesome-prometheus-alerts/refs/heads/master/dist/rules/kafka/linkedin-kafka-exporter.yml
warning

3.3.2.1. Kafka topic offset decreased

Kafka topic offset has decreased

- alert: KafkaTopicOffsetDecreased
  expr: delta(kafka_burrow_partition_current_offset[1m]) < 0
  for: 0m
  labels:
    severity: warning
  annotations:
    summary: Kafka topic offset decreased (instance {{ $labels.instance }})
    description: "Kafka topic offset has decreased\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

3.3.2.2. Kafka consumer lag

Kafka consumer has a 30 minutes and increasing lag

- alert: KafkaConsumerLag
  expr: kafka_burrow_topic_partition_offset - on(partition, cluster, topic) group_right() kafka_burrow_partition_current_offset >= (kafka_burrow_topic_partition_offset offset 15m - on(partition, cluster, topic) group_right() kafka_burrow_partition_current_offset offset 15m) AND kafka_burrow_topic_partition_offset - on(partition, cluster, topic) group_right() kafka_burrow_partition_current_offset > 0
  for: 15m
  labels:
    severity: warning
  annotations:
    summary: Kafka consumer lag (instance {{ $labels.instance }})
    description: "Kafka consumer has a 30 minutes and increasing lag\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"