Skip to main content
APA
Sponsored by CAST AI — Kubernetes cost optimization Better Stack — Uptime monitoring and log management
⚠️

Alert thresholds depend on the nature of your applications. Some queries may have arbitrary tolerance thresholds. Building an efficient monitoring platform takes time. 😉

Cassandra Prometheus Alert Rules

30 Prometheus alerting rules for Cassandra. Exported via instaclustr/cassandra-exporter, criteo/cassandra_exporter. These rules cover critical and warning conditions — copy and paste the YAML into your Prometheus configuration.

wget https://raw.githubusercontent.com/samber/awesome-prometheus-alerts/refs/heads/master/dist/rules/cassandra/instaclustr-cassandra-exporter.yml
critical

2.13.1.1. Cassandra Node is unavailable

Cassandra Node is unavailable - {{ $labels.cassandra_cluster }} {{ $labels.exported_endpoint }}

  # 1m delay allows a restart without triggering an alert.
- alert: CassandraNodeIsUnavailable
  expr: cassandra_endpoint_active < 1
  for: 1m
  labels:
    severity: critical
  annotations:
    summary: Cassandra Node is unavailable (instance {{ $labels.instance }})
    description: "Cassandra Node is unavailable - {{ $labels.cassandra_cluster }} {{ $labels.exported_endpoint }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

2.13.1.2. Cassandra many compaction tasks are pending

Many Cassandra compaction tasks are pending - {{ $labels.cassandra_cluster }}

- alert: CassandraManyCompactionTasksArePending
  expr: cassandra_table_estimated_pending_compactions > 100
  for: 0m
  labels:
    severity: warning
  annotations:
    summary: Cassandra many compaction tasks are pending (instance {{ $labels.instance }})
    description: "Many Cassandra compaction tasks are pending - {{ $labels.cassandra_cluster }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

2.13.1.3. Cassandra commitlog pending tasks (Instaclustr)

Cassandra commitlog pending tasks - {{ $labels.cassandra_cluster }}

- alert: CassandraCommitlogPendingTasks(Instaclustr)
  expr: cassandra_commit_log_pending_tasks > 15
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: Cassandra commitlog pending tasks (Instaclustr) (instance {{ $labels.instance }})
    description: "Cassandra commitlog pending tasks - {{ $labels.cassandra_cluster }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

2.13.1.4. Cassandra compaction executor blocked tasks (Instaclustr)

Some Cassandra compaction executor tasks are blocked - {{ $labels.cassandra_cluster }}

- alert: CassandraCompactionExecutorBlockedTasks(Instaclustr)
  expr: cassandra_thread_pool_blocked_tasks{pool="CompactionExecutor"} > 15
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: Cassandra compaction executor blocked tasks (Instaclustr) (instance {{ $labels.instance }})
    description: "Some Cassandra compaction executor tasks are blocked - {{ $labels.cassandra_cluster }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

2.13.1.5. Cassandra flush writer blocked tasks (Instaclustr)

Some Cassandra flush writer tasks are blocked - {{ $labels.cassandra_cluster }}

- alert: CassandraFlushWriterBlockedTasks(Instaclustr)
  expr: cassandra_thread_pool_blocked_tasks{pool="MemtableFlushWriter"} > 15
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: Cassandra flush writer blocked tasks (Instaclustr) (instance {{ $labels.instance }})
    description: "Some Cassandra flush writer tasks are blocked - {{ $labels.cassandra_cluster }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

2.13.1.6. Cassandra connection timeouts total (Instaclustr)

Some connection between nodes are ending in timeout - {{ $labels.cassandra_cluster }}

- alert: CassandraConnectionTimeoutsTotal(Instaclustr)
  expr: sum by (cassandra_cluster,instance) (rate(cassandra_client_request_timeouts_total[5m])) > 5
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: Cassandra connection timeouts total (Instaclustr) (instance {{ $labels.instance }})
    description: "Some connection between nodes are ending in timeout - {{ $labels.cassandra_cluster }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

2.13.1.7. Cassandra storage exceptions (Instaclustr)

Something is going wrong with cassandra storage - {{ $labels.cassandra_cluster }}

- alert: CassandraStorageExceptions(Instaclustr)
  expr: changes(cassandra_storage_exceptions_total[1m]) > 1
  for: 0m
  labels:
    severity: critical
  annotations:
    summary: Cassandra storage exceptions (Instaclustr) (instance {{ $labels.instance }})
    description: "Something is going wrong with cassandra storage - {{ $labels.cassandra_cluster }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

2.13.1.8. Cassandra tombstone dump (Instaclustr)

Cassandra tombstone dump - {{ $labels.cassandra_cluster }}

- alert: CassandraTombstoneDump(Instaclustr)
  expr: avg(cassandra_table_tombstones_scanned{quantile="0.99"}) by (instance,cassandra_cluster,keyspace) > 100
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: Cassandra tombstone dump (Instaclustr) (instance {{ $labels.instance }})
    description: "Cassandra tombstone dump - {{ $labels.cassandra_cluster }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

2.13.1.9. Cassandra client request unavailable write (Instaclustr)

Some Cassandra client requests are unavailable to write - {{ $labels.cassandra_cluster }}

- alert: CassandraClientRequestUnavailableWrite(Instaclustr)
  expr: changes(cassandra_client_request_unavailable_exceptions_total{operation="write"}[1m]) > 0
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: Cassandra client request unavailable write (Instaclustr) (instance {{ $labels.instance }})
    description: "Some Cassandra client requests are unavailable to write - {{ $labels.cassandra_cluster }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

2.13.1.10. Cassandra client request unavailable read (Instaclustr)

Some Cassandra client requests are unavailable to read - {{ $labels.cassandra_cluster }}

- alert: CassandraClientRequestUnavailableRead(Instaclustr)
  expr: changes(cassandra_client_request_unavailable_exceptions_total{operation="read"}[1m]) > 0
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: Cassandra client request unavailable read (Instaclustr) (instance {{ $labels.instance }})
    description: "Some Cassandra client requests are unavailable to read - {{ $labels.cassandra_cluster }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

2.13.1.11. Cassandra client request write failure (Instaclustr)

Write failures have occurred, ensure there are not too many unavailable nodes - {{ $labels.cassandra_cluster }}

- alert: CassandraClientRequestWriteFailure(Instaclustr)
  expr: increase(cassandra_client_request_failures_total{operation="write"}[1m]) > 5
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: Cassandra client request write failure (Instaclustr) (instance {{ $labels.instance }})
    description: "Write failures have occurred, ensure there are not too many unavailable nodes - {{ $labels.cassandra_cluster }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

2.13.1.12. Cassandra client request read failure (Instaclustr)

Read failures have occurred, ensure there are not too many unavailable nodes - {{ $labels.cassandra_cluster }}

- alert: CassandraClientRequestReadFailure(Instaclustr)
  expr: increase(cassandra_client_request_failures_total{operation="read"}[1m]) > 5
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: Cassandra client request read failure (Instaclustr) (instance {{ $labels.instance }})
    description: "Read failures have occurred, ensure there are not too many unavailable nodes - {{ $labels.cassandra_cluster }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

2.13.2. criteo/cassandra_exporter (18 rules)

wget https://raw.githubusercontent.com/samber/awesome-prometheus-alerts/refs/heads/master/dist/rules/cassandra/criteo-cassandra-exporter.yml
critical

2.13.2.1. Cassandra hints count

Cassandra hints count has changed on {{ $labels.instance }} some nodes may go down

- alert: CassandraHintsCount
  expr: changes(cassandra_stats{name="org:apache:cassandra:metrics:storage:totalhints:count"}[1m]) > 3
  for: 0m
  labels:
    severity: critical
  annotations:
    summary: Cassandra hints count (instance {{ $labels.instance }})
    description: "Cassandra hints count has changed on {{ $labels.instance }} some nodes may go down\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

2.13.2.2. Cassandra compaction task pending

Many Cassandra compaction tasks are pending. You might need to increase I/O capacity by adding nodes to the cluster.

- alert: CassandraCompactionTaskPending
  expr: cassandra_stats{name="org:apache:cassandra:metrics:compaction:pendingtasks:value"} > 100
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: Cassandra compaction task pending (instance {{ $labels.instance }})
    description: "Many Cassandra compaction tasks are pending. You might need to increase I/O capacity by adding nodes to the cluster.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

2.13.2.3. Cassandra viewwrite latency

High viewwrite latency on {{ $labels.instance }} cassandra node

- alert: CassandraViewwriteLatency
  expr: cassandra_stats{name="org:apache:cassandra:metrics:clientrequest:viewwrite:viewwritelatency:99thpercentile"} > 100000
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: Cassandra viewwrite latency (instance {{ $labels.instance }})
    description: "High viewwrite latency on {{ $labels.instance }} cassandra node\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

2.13.2.4. Cassandra authentication failures

Increase of Cassandra authentication failures

- alert: CassandraAuthenticationFailures
  expr: delta(cassandra_stats{name="org:apache:cassandra:metrics:client:authfailure:count"}[1m]) > 5
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: Cassandra authentication failures (instance {{ $labels.instance }})
    description: "Increase of Cassandra authentication failures\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

2.13.2.5. Cassandra node down

Cassandra node down

  # 1m delay allows a restart without triggering an alert.
- alert: CassandraNodeDown
  expr: sum(cassandra_stats{name="org:apache:cassandra:net:failuredetector:downendpointcount"}) by (service,group,cluster,env) > 0
  for: 1m
  labels:
    severity: critical
  annotations:
    summary: Cassandra node down (instance {{ $labels.instance }})
    description: "Cassandra node down\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

2.13.2.6. Cassandra commitlog pending tasks (Criteo)

Unexpected number of Cassandra commitlog pending tasks

- alert: CassandraCommitlogPendingTasks(Criteo)
  expr: cassandra_stats{name="org:apache:cassandra:metrics:commitlog:pendingtasks:value"} > 15
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: Cassandra commitlog pending tasks (Criteo) (instance {{ $labels.instance }})
    description: "Unexpected number of Cassandra commitlog pending tasks\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

2.13.2.7. Cassandra compaction executor blocked tasks (Criteo)

Some Cassandra compaction executor tasks are blocked

- alert: CassandraCompactionExecutorBlockedTasks(Criteo)
  expr: cassandra_stats{name="org:apache:cassandra:metrics:threadpools:internal:compactionexecutor:currentlyblockedtasks:count"} > 0
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: Cassandra compaction executor blocked tasks (Criteo) (instance {{ $labels.instance }})
    description: "Some Cassandra compaction executor tasks are blocked\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

2.13.2.8. Cassandra flush writer blocked tasks (Criteo)

Some Cassandra flush writer tasks are blocked

- alert: CassandraFlushWriterBlockedTasks(Criteo)
  expr: cassandra_stats{name="org:apache:cassandra:metrics:threadpools:internal:memtableflushwriter:currentlyblockedtasks:count"} > 0
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: Cassandra flush writer blocked tasks (Criteo) (instance {{ $labels.instance }})
    description: "Some Cassandra flush writer tasks are blocked\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

2.13.2.9. Cassandra repair pending tasks

Some Cassandra repair tasks are pending

- alert: CassandraRepairPendingTasks
  expr: cassandra_stats{name="org:apache:cassandra:metrics:threadpools:internal:antientropystage:pendingtasks:value"} > 2
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: Cassandra repair pending tasks (instance {{ $labels.instance }})
    description: "Some Cassandra repair tasks are pending\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

2.13.2.10. Cassandra repair blocked tasks

Some Cassandra repair tasks are blocked

- alert: CassandraRepairBlockedTasks
  expr: cassandra_stats{name="org:apache:cassandra:metrics:threadpools:internal:antientropystage:currentlyblockedtasks:count"} > 0
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: Cassandra repair blocked tasks (instance {{ $labels.instance }})
    description: "Some Cassandra repair tasks are blocked\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

2.13.2.11. Cassandra connection timeouts total (Criteo)

Some connection between nodes are ending in timeout

- alert: CassandraConnectionTimeoutsTotal(Criteo)
  expr: delta(cassandra_stats{name="org:apache:cassandra:metrics:connection:totaltimeouts:count"}[1m]) > 5
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: Cassandra connection timeouts total (Criteo) (instance {{ $labels.instance }})
    description: "Some connection between nodes are ending in timeout\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

2.13.2.12. Cassandra storage exceptions (Criteo)

Something is going wrong with cassandra storage

- alert: CassandraStorageExceptions(Criteo)
  expr: changes(cassandra_stats{name="org:apache:cassandra:metrics:storage:exceptions:count"}[1m]) > 1
  for: 0m
  labels:
    severity: critical
  annotations:
    summary: Cassandra storage exceptions (Criteo) (instance {{ $labels.instance }})
    description: "Something is going wrong with cassandra storage\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

2.13.2.13. Cassandra tombstone dump (Criteo)

Too much tombstones scanned in queries

- alert: CassandraTombstoneDump(Criteo)
  expr: cassandra_stats{name="org:apache:cassandra:metrics:table:tombstonescannedhistogram:99thpercentile"} > 1000
  for: 0m
  labels:
    severity: critical
  annotations:
    summary: Cassandra tombstone dump (Criteo) (instance {{ $labels.instance }})
    description: "Too much tombstones scanned in queries\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

2.13.2.14. Cassandra client request unavailable write (Criteo)

Write failures have occurred because too many nodes are unavailable

- alert: CassandraClientRequestUnavailableWrite(Criteo)
  expr: changes(cassandra_stats{name="org:apache:cassandra:metrics:clientrequest:write:unavailables:count"}[1m]) > 0
  for: 0m
  labels:
    severity: critical
  annotations:
    summary: Cassandra client request unavailable write (Criteo) (instance {{ $labels.instance }})
    description: "Write failures have occurred because too many nodes are unavailable\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

2.13.2.15. Cassandra client request unavailable read (Criteo)

Read failures have occurred because too many nodes are unavailable

- alert: CassandraClientRequestUnavailableRead(Criteo)
  expr: changes(cassandra_stats{name="org:apache:cassandra:metrics:clientrequest:read:unavailables:count"}[1m]) > 0
  for: 0m
  labels:
    severity: critical
  annotations:
    summary: Cassandra client request unavailable read (Criteo) (instance {{ $labels.instance }})
    description: "Read failures have occurred because too many nodes are unavailable\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

2.13.2.16. Cassandra client request write failure (Criteo)

A lot of write failures encountered. A write failure is a non-timeout exception encountered during a write request. Examine the reason map to find to the root cause. The most common cause for this type of error is when batch sizes are too large.

- alert: CassandraClientRequestWriteFailure(Criteo)
  expr: cassandra_stats{name="org:apache:cassandra:metrics:clientrequest:write:failures:oneminuterate"} > 0.05
  for: 0m
  labels:
    severity: critical
  annotations:
    summary: Cassandra client request write failure (Criteo) (instance {{ $labels.instance }})
    description: "A lot of write failures encountered. A write failure is a non-timeout exception encountered during a write request. Examine the reason map to find to the root cause. The most common cause for this type of error is when batch sizes are too large.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

2.13.2.17. Cassandra client request read failure (Criteo)

A lot of read failures encountered. A read failure is a non-timeout exception encountered during a read request. Examine the reason map to find to the root cause. The most common cause for this type of error is when batch sizes are too large.

- alert: CassandraClientRequestReadFailure(Criteo)
  expr: cassandra_stats{name="org:apache:cassandra:metrics:clientrequest:read:failures:oneminuterate"} > 0.05
  for: 0m
  labels:
    severity: critical
  annotations:
    summary: Cassandra client request read failure (Criteo) (instance {{ $labels.instance }})
    description: "A lot of read failures encountered. A read failure is a non-timeout exception encountered during a read request. Examine the reason map to find to the root cause. The most common cause for this type of error is when batch sizes are too large.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

2.13.2.18. Cassandra cache hit rate key cache

Key cache hit rate is below 85%

  # A low key cache hit rate increases disk I/O. Threshold is workload-dependent — adjust based on your data access patterns.
- alert: CassandraCacheHitRateKeyCache
  expr: cassandra_stats{name="org:apache:cassandra:metrics:cache:keycache:hitrate:value"} < .85
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: Cassandra cache hit rate key cache (instance {{ $labels.instance }})
    description: "Key cache hit rate is below 85%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"