What is the Prometheus alert rule for "Cassandra Node is unavailable"?

Cassandra Node is unavailable - {{ $labels.cassandra_cluster }} {{ $labels.exported_endpoint }} PromQL expression: cassandra_endpoint_active < 1. Severity: critical. Duration: 1m.

What is the Prometheus alert rule for "Cassandra many compaction tasks are pending"?

Many Cassandra compaction tasks are pending - {{ $labels.cassandra_cluster }} PromQL expression: cassandra_table_estimated_pending_compactions > 100. Severity: warning.

What is the Prometheus alert rule for "Cassandra commitlog pending tasks (Instaclustr)"?

Cassandra commitlog pending tasks - {{ $labels.cassandra_cluster }} PromQL expression: cassandra_commit_log_pending_tasks > 15. Severity: warning. Duration: 2m.

What is the Prometheus alert rule for "Cassandra compaction executor blocked tasks (Instaclustr)"?

Some Cassandra compaction executor tasks are blocked - {{ $labels.cassandra_cluster }} PromQL expression: cassandra_thread_pool_blocked_tasks{pool="CompactionExecutor"} > 15. Severity: warning. Duration: 2m.

What is the Prometheus alert rule for "Cassandra flush writer blocked tasks (Instaclustr)"?

Some Cassandra flush writer tasks are blocked - {{ $labels.cassandra_cluster }} PromQL expression: cassandra_thread_pool_blocked_tasks{pool="MemtableFlushWriter"} > 15. Severity: warning. Duration: 2m.

What is the Prometheus alert rule for "Cassandra connection timeouts total (Instaclustr)"?

Some connection between nodes are ending in timeout - {{ $labels.cassandra_cluster }} PromQL expression: sum by (cassandra_cluster,instance) (rate(cassandra_client_request_timeouts_total[5m])) > 5. Severity: critical. Duration: 2m.

What is the Prometheus alert rule for "Cassandra storage exceptions (Instaclustr)"?

Something is going wrong with cassandra storage - {{ $labels.cassandra_cluster }} PromQL expression: changes(cassandra_storage_exceptions_total[1m]) > 1. Severity: critical.

What is the Prometheus alert rule for "Cassandra client request unavailable write (Instaclustr)"?

Some Cassandra client requests are unavailable to write - {{ $labels.cassandra_cluster }} PromQL expression: changes(cassandra_client_request_unavailable_exceptions_total{operation="write"}[1m]) > 0. Severity: critical. Duration: 2m.

What is the Prometheus alert rule for "Cassandra client request unavailable read (Instaclustr)"?

Some Cassandra client requests are unavailable to read - {{ $labels.cassandra_cluster }} PromQL expression: changes(cassandra_client_request_unavailable_exceptions_total{operation="read"}[1m]) > 0. Severity: critical. Duration: 2m.

What is the Prometheus alert rule for "Cassandra client request write failure (Instaclustr)"?

Write failures have occurred, ensure there are not too many unavailable nodes - {{ $labels.cassandra_cluster }} PromQL expression: increase(cassandra_client_request_failures_total{operation="write"}[1m]) > 5. Severity: critical. Duration: 2m.

What is the Prometheus alert rule for "Cassandra client request read failure (Instaclustr)"?

Read failures have occurred, ensure there are not too many unavailable nodes - {{ $labels.cassandra_cluster }} PromQL expression: increase(cassandra_client_request_failures_total{operation="read"}[1m]) > 5. Severity: critical. Duration: 2m.

What is the Prometheus alert rule for "Cassandra hints count"?

Cassandra hints count has changed on {{ $labels.instance }} some nodes may go down PromQL expression: changes(cassandra_stats{name="org:apache:cassandra:metrics:storage:totalhints:count"}[1m]) > 3. Severity: critical.

What is the Prometheus alert rule for "Cassandra compaction task pending"?

Many Cassandra compaction tasks are pending. You might need to increase I/O capacity by adding nodes to the cluster. PromQL expression: cassandra_stats{name="org:apache:cassandra:metrics:compaction:pendingtasks:value"} > 100. Severity: warning. Duration: 2m.

What is the Prometheus alert rule for "Cassandra commitlog pending tasks (Criteo)"?

Unexpected number of Cassandra commitlog pending tasks PromQL expression: cassandra_stats{name="org:apache:cassandra:metrics:commitlog:pendingtasks:value"} > 15. Severity: warning. Duration: 2m.

What is the Prometheus alert rule for "Cassandra storage exceptions (Criteo)"?

Something is going wrong with cassandra storage PromQL expression: changes(cassandra_stats{name="org:apache:cassandra:metrics:storage:exceptions:count"}[1m]) > 1. Severity: critical.

What is the Prometheus alert rule for "Cassandra client request unavailable write (Criteo)"?

Write failures have occurred because too many nodes are unavailable PromQL expression: changes(cassandra_stats{name="org:apache:cassandra:metrics:clientrequest:write:unavailables:count"}[1m]) > 0. Severity: critical.

What is the Prometheus alert rule for "Cassandra client request unavailable read (Criteo)"?

Read failures have occurred because too many nodes are unavailable PromQL expression: changes(cassandra_stats{name="org:apache:cassandra:metrics:clientrequest:read:unavailables:count"}[1m]) > 0. Severity: critical.

Cassandra Prometheus Alert Rules

Q: What is the Prometheus alert rule for "Cassandra tombstone dump (Instaclustr)"?

Cassandra tombstone dump - {{ $labels.cassandra_cluster }} PromQL expression: avg(cassandra_table_tombstones_scanned{quantile="0.99"}) by (instance,cassandra_cluster,keyspace) > 100. Severity: critical. Duration: 2m.

30 Prometheus alerting rules for Cassandra. Exported via instaclustr/cassandra-exporter, criteo/cassandra_exporter. These rules cover critical and warning conditions — copy and paste the YAML into your Prometheus configuration.

⚠️

Alert thresholds depend on the nature of your applications. Some queries may have arbitrary tolerance thresholds. Building an efficient monitoring platform takes time. 😉

groups:
- name: InstaclustrCassandraExporter
  rules:
      # 1m delay allows a restart without triggering an alert.
    - alert: CassandraNodeIsUnavailable
      expr: cassandra_endpoint_active < 1
      for: 1m
      labels:
        severity: critical
      annotations:
        summary: Cassandra Node is unavailable (instance {{ $labels.instance }})
        description: "Cassandra Node is unavailable - {{ $labels.cassandra_cluster }} {{ $labels.exported_endpoint }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    - alert: CassandraManyCompactionTasksArePending
      expr: cassandra_table_estimated_pending_compactions > 100
      for: 0m
      labels:
        severity: warning
      annotations:
        summary: Cassandra many compaction tasks are pending (instance {{ $labels.instance }})
        description: "Many Cassandra compaction tasks are pending - {{ $labels.cassandra_cluster }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    - alert: CassandraCommitlogPendingTasks(Instaclustr)
      expr: cassandra_commit_log_pending_tasks > 15
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: Cassandra commitlog pending tasks (Instaclustr) (instance {{ $labels.instance }})
        description: "Cassandra commitlog pending tasks - {{ $labels.cassandra_cluster }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    - alert: CassandraCompactionExecutorBlockedTasks(Instaclustr)
      expr: cassandra_thread_pool_blocked_tasks{pool="CompactionExecutor"} > 15
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: Cassandra compaction executor blocked tasks (Instaclustr) (instance {{ $labels.instance }})
        description: "Some Cassandra compaction executor tasks are blocked - {{ $labels.cassandra_cluster }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    - alert: CassandraFlushWriterBlockedTasks(Instaclustr)
      expr: cassandra_thread_pool_blocked_tasks{pool="MemtableFlushWriter"} > 15
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: Cassandra flush writer blocked tasks (Instaclustr) (instance {{ $labels.instance }})
        description: "Some Cassandra flush writer tasks are blocked - {{ $labels.cassandra_cluster }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    - alert: CassandraConnectionTimeoutsTotal(Instaclustr)
      expr: sum by (cassandra_cluster,instance) (rate(cassandra_client_request_timeouts_total[5m])) > 5
      for: 2m
      labels:
        severity: critical
      annotations:
        summary: Cassandra connection timeouts total (Instaclustr) (instance {{ $labels.instance }})
        description: "Some connection between nodes are ending in timeout - {{ $labels.cassandra_cluster }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    - alert: CassandraStorageExceptions(Instaclustr)
      expr: changes(cassandra_storage_exceptions_total[1m]) > 1
      for: 0m
      labels:
        severity: critical
      annotations:
        summary: Cassandra storage exceptions (Instaclustr) (instance {{ $labels.instance }})
        description: "Something is going wrong with cassandra storage - {{ $labels.cassandra_cluster }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    - alert: CassandraTombstoneDump(Instaclustr)
      expr: avg(cassandra_table_tombstones_scanned{quantile="0.99"}) by (instance,cassandra_cluster,keyspace) > 100
      for: 2m
      labels:
        severity: critical
      annotations:
        summary: Cassandra tombstone dump (Instaclustr) (instance {{ $labels.instance }})
        description: "Cassandra tombstone dump - {{ $labels.cassandra_cluster }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    - alert: CassandraClientRequestUnavailableWrite(Instaclustr)
      expr: changes(cassandra_client_request_unavailable_exceptions_total{operation="write"}[1m]) > 0
      for: 2m
      labels:
        severity: critical
      annotations:
        summary: Cassandra client request unavailable write (Instaclustr) (instance {{ $labels.instance }})
        description: "Some Cassandra client requests are unavailable to write - {{ $labels.cassandra_cluster }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    - alert: CassandraClientRequestUnavailableRead(Instaclustr)
      expr: changes(cassandra_client_request_unavailable_exceptions_total{operation="read"}[1m]) > 0
      for: 2m
      labels:
        severity: critical
      annotations:
        summary: Cassandra client request unavailable read (Instaclustr) (instance {{ $labels.instance }})
        description: "Some Cassandra client requests are unavailable to read - {{ $labels.cassandra_cluster }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    - alert: CassandraClientRequestWriteFailure(Instaclustr)
      expr: increase(cassandra_client_request_failures_total{operation="write"}[1m]) > 5
      for: 2m
      labels:
        severity: critical
      annotations:
        summary: Cassandra client request write failure (Instaclustr) (instance {{ $labels.instance }})
        description: "Write failures have occurred, ensure there are not too many unavailable nodes - {{ $labels.cassandra_cluster }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    - alert: CassandraClientRequestReadFailure(Instaclustr)
      expr: increase(cassandra_client_request_failures_total{operation="read"}[1m]) > 5
      for: 2m
      labels:
        severity: critical
      annotations:
        summary: Cassandra client request read failure (Instaclustr) (instance {{ $labels.instance }})
        description: "Read failures have occurred, ensure there are not too many unavailable nodes - {{ $labels.cassandra_cluster }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

2.13.1. instaclustr/cassandra-exporter (12 rules)

wget https://raw.githubusercontent.com/samber/awesome-prometheus-alerts/refs/heads/master/dist/rules/cassandra/instaclustr-cassandra-exporter.yml

critical

2.13.1.1. Cassandra Node is unavailable

Cassandra Node is unavailable - {{ $labels.cassandra_cluster }} {{ $labels.exported_endpoint }}

  # 1m delay allows a restart without triggering an alert.
- alert: CassandraNodeIsUnavailable
  expr: cassandra_endpoint_active < 1
  for: 1m
  labels:
    severity: critical
  annotations:
    summary: Cassandra Node is unavailable (instance {{ $labels.instance }})
    description: "Cassandra Node is unavailable - {{ $labels.cassandra_cluster }} {{ $labels.exported_endpoint }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

warning

2.13.1.2. Cassandra many compaction tasks are pending

Many Cassandra compaction tasks are pending - {{ $labels.cassandra_cluster }}

- alert: CassandraManyCompactionTasksArePending
  expr: cassandra_table_estimated_pending_compactions > 100
  for: 0m
  labels:
    severity: warning
  annotations:
    summary: Cassandra many compaction tasks are pending (instance {{ $labels.instance }})
    description: "Many Cassandra compaction tasks are pending - {{ $labels.cassandra_cluster }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

warning

2.13.1.3. Cassandra commitlog pending tasks (Instaclustr)

Cassandra commitlog pending tasks - {{ $labels.cassandra_cluster }}

- alert: CassandraCommitlogPendingTasks(Instaclustr)
  expr: cassandra_commit_log_pending_tasks > 15
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: Cassandra commitlog pending tasks (Instaclustr) (instance {{ $labels.instance }})
    description: "Cassandra commitlog pending tasks - {{ $labels.cassandra_cluster }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

warning

2.13.1.4. Cassandra compaction executor blocked tasks (Instaclustr)

Some Cassandra compaction executor tasks are blocked - {{ $labels.cassandra_cluster }}

- alert: CassandraCompactionExecutorBlockedTasks(Instaclustr)
  expr: cassandra_thread_pool_blocked_tasks{pool="CompactionExecutor"} > 15
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: Cassandra compaction executor blocked tasks (Instaclustr) (instance {{ $labels.instance }})
    description: "Some Cassandra compaction executor tasks are blocked - {{ $labels.cassandra_cluster }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

warning

2.13.1.5. Cassandra flush writer blocked tasks (Instaclustr)

Some Cassandra flush writer tasks are blocked - {{ $labels.cassandra_cluster }}

- alert: CassandraFlushWriterBlockedTasks(Instaclustr)
  expr: cassandra_thread_pool_blocked_tasks{pool="MemtableFlushWriter"} > 15
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: Cassandra flush writer blocked tasks (Instaclustr) (instance {{ $labels.instance }})
    description: "Some Cassandra flush writer tasks are blocked - {{ $labels.cassandra_cluster }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

critical

2.13.1.6. Cassandra connection timeouts total (Instaclustr)

Some connection between nodes are ending in timeout - {{ $labels.cassandra_cluster }}

- alert: CassandraConnectionTimeoutsTotal(Instaclustr)
  expr: sum by (cassandra_cluster,instance) (rate(cassandra_client_request_timeouts_total[5m])) > 5
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: Cassandra connection timeouts total (Instaclustr) (instance {{ $labels.instance }})
    description: "Some connection between nodes are ending in timeout - {{ $labels.cassandra_cluster }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

critical

2.13.1.7. Cassandra storage exceptions (Instaclustr)

Something is going wrong with cassandra storage - {{ $labels.cassandra_cluster }}

- alert: CassandraStorageExceptions(Instaclustr)
  expr: changes(cassandra_storage_exceptions_total[1m]) > 1
  for: 0m
  labels:
    severity: critical
  annotations:
    summary: Cassandra storage exceptions (Instaclustr) (instance {{ $labels.instance }})
    description: "Something is going wrong with cassandra storage - {{ $labels.cassandra_cluster }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

critical

2.13.1.8. Cassandra tombstone dump (Instaclustr)

Cassandra tombstone dump - {{ $labels.cassandra_cluster }}

- alert: CassandraTombstoneDump(Instaclustr)
  expr: avg(cassandra_table_tombstones_scanned{quantile="0.99"}) by (instance,cassandra_cluster,keyspace) > 100
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: Cassandra tombstone dump (Instaclustr) (instance {{ $labels.instance }})
    description: "Cassandra tombstone dump - {{ $labels.cassandra_cluster }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

critical

2.13.1.9. Cassandra client request unavailable write (Instaclustr)

Some Cassandra client requests are unavailable to write - {{ $labels.cassandra_cluster }}

- alert: CassandraClientRequestUnavailableWrite(Instaclustr)
  expr: changes(cassandra_client_request_unavailable_exceptions_total{operation="write"}[1m]) > 0
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: Cassandra client request unavailable write (Instaclustr) (instance {{ $labels.instance }})
    description: "Some Cassandra client requests are unavailable to write - {{ $labels.cassandra_cluster }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

critical

2.13.1.10. Cassandra client request unavailable read (Instaclustr)

Some Cassandra client requests are unavailable to read - {{ $labels.cassandra_cluster }}

- alert: CassandraClientRequestUnavailableRead(Instaclustr)
  expr: changes(cassandra_client_request_unavailable_exceptions_total{operation="read"}[1m]) > 0
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: Cassandra client request unavailable read (Instaclustr) (instance {{ $labels.instance }})
    description: "Some Cassandra client requests are unavailable to read - {{ $labels.cassandra_cluster }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

critical

2.13.1.11. Cassandra client request write failure (Instaclustr)

Write failures have occurred, ensure there are not too many unavailable nodes - {{ $labels.cassandra_cluster }}

- alert: CassandraClientRequestWriteFailure(Instaclustr)
  expr: increase(cassandra_client_request_failures_total{operation="write"}[1m]) > 5
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: Cassandra client request write failure (Instaclustr) (instance {{ $labels.instance }})
    description: "Write failures have occurred, ensure there are not too many unavailable nodes - {{ $labels.cassandra_cluster }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

critical

2.13.1.12. Cassandra client request read failure (Instaclustr)

Read failures have occurred, ensure there are not too many unavailable nodes - {{ $labels.cassandra_cluster }}

- alert: CassandraClientRequestReadFailure(Instaclustr)
  expr: increase(cassandra_client_request_failures_total{operation="read"}[1m]) > 5
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: Cassandra client request read failure (Instaclustr) (instance {{ $labels.instance }})
    description: "Read failures have occurred, ensure there are not too many unavailable nodes - {{ $labels.cassandra_cluster }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

groups:
- name: CriteoCassandraExporter
  rules:
    - alert: CassandraHintsCount
      expr: changes(cassandra_stats{name="org:apache:cassandra:metrics:storage:totalhints:count"}[1m]) > 3
      for: 0m
      labels:
        severity: critical
      annotations:
        summary: Cassandra hints count (instance {{ $labels.instance }})
        description: "Cassandra hints count has changed on {{ $labels.instance }} some nodes may go down\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    - alert: CassandraCompactionTaskPending
      expr: cassandra_stats{name="org:apache:cassandra:metrics:compaction:pendingtasks:value"} > 100
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: Cassandra compaction task pending (instance {{ $labels.instance }})
        description: "Many Cassandra compaction tasks are pending. You might need to increase I/O capacity by adding nodes to the cluster.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    - alert: CassandraViewwriteLatency
      expr: cassandra_stats{name="org:apache:cassandra:metrics:clientrequest:viewwrite:viewwritelatency:99thpercentile"} > 100000
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: Cassandra viewwrite latency (instance {{ $labels.instance }})
        description: "High viewwrite latency on {{ $labels.instance }} cassandra node\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    - alert: CassandraAuthenticationFailures
      expr: delta(cassandra_stats{name="org:apache:cassandra:metrics:client:authfailure:count"}[1m]) > 5
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: Cassandra authentication failures (instance {{ $labels.instance }})
        description: "Increase of Cassandra authentication failures\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
      # 1m delay allows a restart without triggering an alert.
    - alert: CassandraNodeDown
      expr: sum(cassandra_stats{name="org:apache:cassandra:net:failuredetector:downendpointcount"}) by (service,group,cluster,env) > 0
      for: 1m
      labels:
        severity: critical
      annotations:
        summary: Cassandra node down (instance {{ $labels.instance }})
        description: "Cassandra node down\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    - alert: CassandraCommitlogPendingTasks(Criteo)
      expr: cassandra_stats{name="org:apache:cassandra:metrics:commitlog:pendingtasks:value"} > 15
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: Cassandra commitlog pending tasks (Criteo) (instance {{ $labels.instance }})
        description: "Unexpected number of Cassandra commitlog pending tasks\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    - alert: CassandraCompactionExecutorBlockedTasks(Criteo)
      expr: cassandra_stats{name="org:apache:cassandra:metrics:threadpools:internal:compactionexecutor:currentlyblockedtasks:count"} > 0
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: Cassandra compaction executor blocked tasks (Criteo) (instance {{ $labels.instance }})
        description: "Some Cassandra compaction executor tasks are blocked\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    - alert: CassandraFlushWriterBlockedTasks(Criteo)
      expr: cassandra_stats{name="org:apache:cassandra:metrics:threadpools:internal:memtableflushwriter:currentlyblockedtasks:count"} > 0
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: Cassandra flush writer blocked tasks (Criteo) (instance {{ $labels.instance }})
        description: "Some Cassandra flush writer tasks are blocked\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    - alert: CassandraRepairPendingTasks
      expr: cassandra_stats{name="org:apache:cassandra:metrics:threadpools:internal:antientropystage:pendingtasks:value"} > 2
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: Cassandra repair pending tasks (instance {{ $labels.instance }})
        description: "Some Cassandra repair tasks are pending\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    - alert: CassandraRepairBlockedTasks
      expr: cassandra_stats{name="org:apache:cassandra:metrics:threadpools:internal:antientropystage:currentlyblockedtasks:count"} > 0
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: Cassandra repair blocked tasks (instance {{ $labels.instance }})
        description: "Some Cassandra repair tasks are blocked\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    - alert: CassandraConnectionTimeoutsTotal(Criteo)
      expr: delta(cassandra_stats{name="org:apache:cassandra:metrics:connection:totaltimeouts:count"}[1m]) > 5
      for: 2m
      labels:
        severity: critical
      annotations:
        summary: Cassandra connection timeouts total (Criteo) (instance {{ $labels.instance }})
        description: "Some connection between nodes are ending in timeout\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    - alert: CassandraStorageExceptions(Criteo)
      expr: changes(cassandra_stats{name="org:apache:cassandra:metrics:storage:exceptions:count"}[1m]) > 1
      for: 0m
      labels:
        severity: critical
      annotations:
        summary: Cassandra storage exceptions (Criteo) (instance {{ $labels.instance }})
        description: "Something is going wrong with cassandra storage\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    - alert: CassandraTombstoneDump(Criteo)
      expr: cassandra_stats{name="org:apache:cassandra:metrics:table:tombstonescannedhistogram:99thpercentile"} > 1000
      for: 0m
      labels:
        severity: critical
      annotations:
        summary: Cassandra tombstone dump (Criteo) (instance {{ $labels.instance }})
        description: "Too much tombstones scanned in queries\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    - alert: CassandraClientRequestUnavailableWrite(Criteo)
      expr: changes(cassandra_stats{name="org:apache:cassandra:metrics:clientrequest:write:unavailables:count"}[1m]) > 0
      for: 0m
      labels:
        severity: critical
      annotations:
        summary: Cassandra client request unavailable write (Criteo) (instance {{ $labels.instance }})
        description: "Write failures have occurred because too many nodes are unavailable\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    - alert: CassandraClientRequestUnavailableRead(Criteo)
      expr: changes(cassandra_stats{name="org:apache:cassandra:metrics:clientrequest:read:unavailables:count"}[1m]) > 0
      for: 0m
      labels:
        severity: critical
      annotations:
        summary: Cassandra client request unavailable read (Criteo) (instance {{ $labels.instance }})
        description: "Read failures have occurred because too many nodes are unavailable\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    - alert: CassandraClientRequestWriteFailure(Criteo)
      expr: cassandra_stats{name="org:apache:cassandra:metrics:clientrequest:write:failures:oneminuterate"} > 0.05
      for: 0m
      labels:
        severity: critical
      annotations:
        summary: Cassandra client request write failure (Criteo) (instance {{ $labels.instance }})
        description: "A lot of write failures encountered. A write failure is a non-timeout exception encountered during a write request. Examine the reason map to find to the root cause. The most common cause for this type of error is when batch sizes are too large.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    - alert: CassandraClientRequestReadFailure(Criteo)
      expr: cassandra_stats{name="org:apache:cassandra:metrics:clientrequest:read:failures:oneminuterate"} > 0.05
      for: 0m
      labels:
        severity: critical
      annotations:
        summary: Cassandra client request read failure (Criteo) (instance {{ $labels.instance }})
        description: "A lot of read failures encountered. A read failure is a non-timeout exception encountered during a read request. Examine the reason map to find to the root cause. The most common cause for this type of error is when batch sizes are too large.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
      # A low key cache hit rate increases disk I/O. Threshold is workload-dependent — adjust based on your data access patterns.
    - alert: CassandraCacheHitRateKeyCache
      expr: cassandra_stats{name="org:apache:cassandra:metrics:cache:keycache:hitrate:value"} < .85
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: Cassandra cache hit rate key cache (instance {{ $labels.instance }})
        description: "Key cache hit rate is below 85%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

2.13.2. criteo/cassandra_exporter (18 rules)

wget https://raw.githubusercontent.com/samber/awesome-prometheus-alerts/refs/heads/master/dist/rules/cassandra/criteo-cassandra-exporter.yml

critical

2.13.2.1. Cassandra hints count

Cassandra hints count has changed on {{ $labels.instance }} some nodes may go down

- alert: CassandraHintsCount
  expr: changes(cassandra_stats{name="org:apache:cassandra:metrics:storage:totalhints:count"}[1m]) > 3
  for: 0m
  labels:
    severity: critical
  annotations:
    summary: Cassandra hints count (instance {{ $labels.instance }})
    description: "Cassandra hints count has changed on {{ $labels.instance }} some nodes may go down\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

warning

2.13.2.2. Cassandra compaction task pending

Many Cassandra compaction tasks are pending. You might need to increase I/O capacity by adding nodes to the cluster.

- alert: CassandraCompactionTaskPending
  expr: cassandra_stats{name="org:apache:cassandra:metrics:compaction:pendingtasks:value"} > 100
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: Cassandra compaction task pending (instance {{ $labels.instance }})
    description: "Many Cassandra compaction tasks are pending. You might need to increase I/O capacity by adding nodes to the cluster.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

warning

2.13.2.3. Cassandra viewwrite latency

High viewwrite latency on {{ $labels.instance }} cassandra node

- alert: CassandraViewwriteLatency
  expr: cassandra_stats{name="org:apache:cassandra:metrics:clientrequest:viewwrite:viewwritelatency:99thpercentile"} > 100000
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: Cassandra viewwrite latency (instance {{ $labels.instance }})
    description: "High viewwrite latency on {{ $labels.instance }} cassandra node\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

warning

2.13.2.4. Cassandra authentication failures

Increase of Cassandra authentication failures

- alert: CassandraAuthenticationFailures
  expr: delta(cassandra_stats{name="org:apache:cassandra:metrics:client:authfailure:count"}[1m]) > 5
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: Cassandra authentication failures (instance {{ $labels.instance }})
    description: "Increase of Cassandra authentication failures\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

critical

2.13.2.5. Cassandra node down

Cassandra node down

  # 1m delay allows a restart without triggering an alert.
- alert: CassandraNodeDown
  expr: sum(cassandra_stats{name="org:apache:cassandra:net:failuredetector:downendpointcount"}) by (service,group,cluster,env) > 0
  for: 1m
  labels:
    severity: critical
  annotations:
    summary: Cassandra node down (instance {{ $labels.instance }})
    description: "Cassandra node down\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

warning

2.13.2.6. Cassandra commitlog pending tasks (Criteo)

Unexpected number of Cassandra commitlog pending tasks

- alert: CassandraCommitlogPendingTasks(Criteo)
  expr: cassandra_stats{name="org:apache:cassandra:metrics:commitlog:pendingtasks:value"} > 15
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: Cassandra commitlog pending tasks (Criteo) (instance {{ $labels.instance }})
    description: "Unexpected number of Cassandra commitlog pending tasks\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

warning

2.13.2.7. Cassandra compaction executor blocked tasks (Criteo)

Some Cassandra compaction executor tasks are blocked

- alert: CassandraCompactionExecutorBlockedTasks(Criteo)
  expr: cassandra_stats{name="org:apache:cassandra:metrics:threadpools:internal:compactionexecutor:currentlyblockedtasks:count"} > 0
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: Cassandra compaction executor blocked tasks (Criteo) (instance {{ $labels.instance }})
    description: "Some Cassandra compaction executor tasks are blocked\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

warning

2.13.2.8. Cassandra flush writer blocked tasks (Criteo)

Some Cassandra flush writer tasks are blocked

- alert: CassandraFlushWriterBlockedTasks(Criteo)
  expr: cassandra_stats{name="org:apache:cassandra:metrics:threadpools:internal:memtableflushwriter:currentlyblockedtasks:count"} > 0
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: Cassandra flush writer blocked tasks (Criteo) (instance {{ $labels.instance }})
    description: "Some Cassandra flush writer tasks are blocked\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

warning

2.13.2.9. Cassandra repair pending tasks

Some Cassandra repair tasks are pending

- alert: CassandraRepairPendingTasks
  expr: cassandra_stats{name="org:apache:cassandra:metrics:threadpools:internal:antientropystage:pendingtasks:value"} > 2
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: Cassandra repair pending tasks (instance {{ $labels.instance }})
    description: "Some Cassandra repair tasks are pending\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

warning

2.13.2.10. Cassandra repair blocked tasks

Some Cassandra repair tasks are blocked

- alert: CassandraRepairBlockedTasks
  expr: cassandra_stats{name="org:apache:cassandra:metrics:threadpools:internal:antientropystage:currentlyblockedtasks:count"} > 0
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: Cassandra repair blocked tasks (instance {{ $labels.instance }})
    description: "Some Cassandra repair tasks are blocked\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

critical

2.13.2.11. Cassandra connection timeouts total (Criteo)

Some connection between nodes are ending in timeout

- alert: CassandraConnectionTimeoutsTotal(Criteo)
  expr: delta(cassandra_stats{name="org:apache:cassandra:metrics:connection:totaltimeouts:count"}[1m]) > 5
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: Cassandra connection timeouts total (Criteo) (instance {{ $labels.instance }})
    description: "Some connection between nodes are ending in timeout\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

critical

2.13.2.12. Cassandra storage exceptions (Criteo)

Something is going wrong with cassandra storage

- alert: CassandraStorageExceptions(Criteo)
  expr: changes(cassandra_stats{name="org:apache:cassandra:metrics:storage:exceptions:count"}[1m]) > 1
  for: 0m
  labels:
    severity: critical
  annotations:
    summary: Cassandra storage exceptions (Criteo) (instance {{ $labels.instance }})
    description: "Something is going wrong with cassandra storage\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

critical

2.13.2.13. Cassandra tombstone dump (Criteo)

Too much tombstones scanned in queries

- alert: CassandraTombstoneDump(Criteo)
  expr: cassandra_stats{name="org:apache:cassandra:metrics:table:tombstonescannedhistogram:99thpercentile"} > 1000
  for: 0m
  labels:
    severity: critical
  annotations:
    summary: Cassandra tombstone dump (Criteo) (instance {{ $labels.instance }})
    description: "Too much tombstones scanned in queries\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

critical

2.13.2.14. Cassandra client request unavailable write (Criteo)

Write failures have occurred because too many nodes are unavailable

- alert: CassandraClientRequestUnavailableWrite(Criteo)
  expr: changes(cassandra_stats{name="org:apache:cassandra:metrics:clientrequest:write:unavailables:count"}[1m]) > 0
  for: 0m
  labels:
    severity: critical
  annotations:
    summary: Cassandra client request unavailable write (Criteo) (instance {{ $labels.instance }})
    description: "Write failures have occurred because too many nodes are unavailable\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

critical

2.13.2.15. Cassandra client request unavailable read (Criteo)

Read failures have occurred because too many nodes are unavailable

- alert: CassandraClientRequestUnavailableRead(Criteo)
  expr: changes(cassandra_stats{name="org:apache:cassandra:metrics:clientrequest:read:unavailables:count"}[1m]) > 0
  for: 0m
  labels:
    severity: critical
  annotations:
    summary: Cassandra client request unavailable read (Criteo) (instance {{ $labels.instance }})
    description: "Read failures have occurred because too many nodes are unavailable\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

critical

2.13.2.16. Cassandra client request write failure (Criteo)

A lot of write failures encountered. A write failure is a non-timeout exception encountered during a write request. Examine the reason map to find to the root cause. The most common cause for this type of error is when batch sizes are too large.

- alert: CassandraClientRequestWriteFailure(Criteo)
  expr: cassandra_stats{name="org:apache:cassandra:metrics:clientrequest:write:failures:oneminuterate"} > 0.05
  for: 0m
  labels:
    severity: critical
  annotations:
    summary: Cassandra client request write failure (Criteo) (instance {{ $labels.instance }})
    description: "A lot of write failures encountered. A write failure is a non-timeout exception encountered during a write request. Examine the reason map to find to the root cause. The most common cause for this type of error is when batch sizes are too large.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

critical

2.13.2.17. Cassandra client request read failure (Criteo)

A lot of read failures encountered. A read failure is a non-timeout exception encountered during a read request. Examine the reason map to find to the root cause. The most common cause for this type of error is when batch sizes are too large.

- alert: CassandraClientRequestReadFailure(Criteo)
  expr: cassandra_stats{name="org:apache:cassandra:metrics:clientrequest:read:failures:oneminuterate"} > 0.05
  for: 0m
  labels:
    severity: critical
  annotations:
    summary: Cassandra client request read failure (Criteo) (instance {{ $labels.instance }})
    description: "A lot of read failures encountered. A read failure is a non-timeout exception encountered during a read request. Examine the reason map to find to the root cause. The most common cause for this type of error is when batch sizes are too large.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

warning

2.13.2.18. Cassandra cache hit rate key cache

Key cache hit rate is below 85%

  # A low key cache hit rate increases disk I/O. Threshold is workload-dependent — adjust based on your data access patterns.
- alert: CassandraCacheHitRateKeyCache
  expr: cassandra_stats{name="org:apache:cassandra:metrics:cache:keycache:hitrate:value"} < .85
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: Cassandra cache hit rate key cache (instance {{ $labels.instance }})
    description: "Key cache hit rate is below 85%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

More in Databases

MySQL PostgreSQL SQL Server Oracle Database Patroni PGBouncer Redis Memcached MongoDB Elasticsearch OpenSearch Meilisearch Clickhouse CouchDB Solr