critical
2.13.2.1. Cassandra hints count
Cassandra hints count has changed on {{ $labels.instance }} some nodes may go down
- alert: CassandraHintsCount
expr: changes(cassandra_stats{name="org:apache:cassandra:metrics:storage:totalhints:count"}[1m]) > 3
for: 0m
labels:
severity: critical
annotations:
summary: Cassandra hints count (instance {{ $labels.instance }})
description: "Cassandra hints count has changed on {{ $labels.instance }} some nodes may go down\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" warning
2.13.2.2. Cassandra compaction task pending
Many Cassandra compaction tasks are pending. You might need to increase I/O capacity by adding nodes to the cluster.
- alert: CassandraCompactionTaskPending
expr: cassandra_stats{name="org:apache:cassandra:metrics:compaction:pendingtasks:value"} > 100
for: 2m
labels:
severity: warning
annotations:
summary: Cassandra compaction task pending (instance {{ $labels.instance }})
description: "Many Cassandra compaction tasks are pending. You might need to increase I/O capacity by adding nodes to the cluster.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" warning
2.13.2.3. Cassandra viewwrite latency
High viewwrite latency on {{ $labels.instance }} cassandra node
- alert: CassandraViewwriteLatency
expr: cassandra_stats{name="org:apache:cassandra:metrics:clientrequest:viewwrite:viewwritelatency:99thpercentile"} > 100000
for: 2m
labels:
severity: warning
annotations:
summary: Cassandra viewwrite latency (instance {{ $labels.instance }})
description: "High viewwrite latency on {{ $labels.instance }} cassandra node\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" warning
2.13.2.4. Cassandra authentication failures
Increase of Cassandra authentication failures
- alert: CassandraAuthenticationFailures
expr: delta(cassandra_stats{name="org:apache:cassandra:metrics:client:authfailure:count"}[1m]) > 5
for: 2m
labels:
severity: warning
annotations:
summary: Cassandra authentication failures (instance {{ $labels.instance }})
description: "Increase of Cassandra authentication failures\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" critical
2.13.2.5. Cassandra node down
Cassandra node down
# 1m delay allows a restart without triggering an alert.
- alert: CassandraNodeDown
expr: sum(cassandra_stats{name="org:apache:cassandra:net:failuredetector:downendpointcount"}) by (service,group,cluster,env) > 0
for: 1m
labels:
severity: critical
annotations:
summary: Cassandra node down (instance {{ $labels.instance }})
description: "Cassandra node down\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" warning
2.13.2.6. Cassandra commitlog pending tasks (Criteo)
Unexpected number of Cassandra commitlog pending tasks
- alert: CassandraCommitlogPendingTasks(Criteo)
expr: cassandra_stats{name="org:apache:cassandra:metrics:commitlog:pendingtasks:value"} > 15
for: 2m
labels:
severity: warning
annotations:
summary: Cassandra commitlog pending tasks (Criteo) (instance {{ $labels.instance }})
description: "Unexpected number of Cassandra commitlog pending tasks\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" warning
2.13.2.7. Cassandra compaction executor blocked tasks (Criteo)
Some Cassandra compaction executor tasks are blocked
- alert: CassandraCompactionExecutorBlockedTasks(Criteo)
expr: cassandra_stats{name="org:apache:cassandra:metrics:threadpools:internal:compactionexecutor:currentlyblockedtasks:count"} > 0
for: 2m
labels:
severity: warning
annotations:
summary: Cassandra compaction executor blocked tasks (Criteo) (instance {{ $labels.instance }})
description: "Some Cassandra compaction executor tasks are blocked\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" warning
2.13.2.8. Cassandra flush writer blocked tasks (Criteo)
Some Cassandra flush writer tasks are blocked
- alert: CassandraFlushWriterBlockedTasks(Criteo)
expr: cassandra_stats{name="org:apache:cassandra:metrics:threadpools:internal:memtableflushwriter:currentlyblockedtasks:count"} > 0
for: 2m
labels:
severity: warning
annotations:
summary: Cassandra flush writer blocked tasks (Criteo) (instance {{ $labels.instance }})
description: "Some Cassandra flush writer tasks are blocked\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" warning
2.13.2.9. Cassandra repair pending tasks
Some Cassandra repair tasks are pending
- alert: CassandraRepairPendingTasks
expr: cassandra_stats{name="org:apache:cassandra:metrics:threadpools:internal:antientropystage:pendingtasks:value"} > 2
for: 2m
labels:
severity: warning
annotations:
summary: Cassandra repair pending tasks (instance {{ $labels.instance }})
description: "Some Cassandra repair tasks are pending\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" warning
2.13.2.10. Cassandra repair blocked tasks
Some Cassandra repair tasks are blocked
- alert: CassandraRepairBlockedTasks
expr: cassandra_stats{name="org:apache:cassandra:metrics:threadpools:internal:antientropystage:currentlyblockedtasks:count"} > 0
for: 2m
labels:
severity: warning
annotations:
summary: Cassandra repair blocked tasks (instance {{ $labels.instance }})
description: "Some Cassandra repair tasks are blocked\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" critical
2.13.2.11. Cassandra connection timeouts total (Criteo)
Some connection between nodes are ending in timeout
- alert: CassandraConnectionTimeoutsTotal(Criteo)
expr: delta(cassandra_stats{name="org:apache:cassandra:metrics:connection:totaltimeouts:count"}[1m]) > 5
for: 2m
labels:
severity: critical
annotations:
summary: Cassandra connection timeouts total (Criteo) (instance {{ $labels.instance }})
description: "Some connection between nodes are ending in timeout\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" critical
2.13.2.12. Cassandra storage exceptions (Criteo)
Something is going wrong with cassandra storage
- alert: CassandraStorageExceptions(Criteo)
expr: changes(cassandra_stats{name="org:apache:cassandra:metrics:storage:exceptions:count"}[1m]) > 1
for: 0m
labels:
severity: critical
annotations:
summary: Cassandra storage exceptions (Criteo) (instance {{ $labels.instance }})
description: "Something is going wrong with cassandra storage\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" critical
2.13.2.13. Cassandra tombstone dump (Criteo)
Too much tombstones scanned in queries
- alert: CassandraTombstoneDump(Criteo)
expr: cassandra_stats{name="org:apache:cassandra:metrics:table:tombstonescannedhistogram:99thpercentile"} > 1000
for: 0m
labels:
severity: critical
annotations:
summary: Cassandra tombstone dump (Criteo) (instance {{ $labels.instance }})
description: "Too much tombstones scanned in queries\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" critical
2.13.2.14. Cassandra client request unavailable write (Criteo)
Write failures have occurred because too many nodes are unavailable
- alert: CassandraClientRequestUnavailableWrite(Criteo)
expr: changes(cassandra_stats{name="org:apache:cassandra:metrics:clientrequest:write:unavailables:count"}[1m]) > 0
for: 0m
labels:
severity: critical
annotations:
summary: Cassandra client request unavailable write (Criteo) (instance {{ $labels.instance }})
description: "Write failures have occurred because too many nodes are unavailable\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" critical
2.13.2.15. Cassandra client request unavailable read (Criteo)
Read failures have occurred because too many nodes are unavailable
- alert: CassandraClientRequestUnavailableRead(Criteo)
expr: changes(cassandra_stats{name="org:apache:cassandra:metrics:clientrequest:read:unavailables:count"}[1m]) > 0
for: 0m
labels:
severity: critical
annotations:
summary: Cassandra client request unavailable read (Criteo) (instance {{ $labels.instance }})
description: "Read failures have occurred because too many nodes are unavailable\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" critical
2.13.2.16. Cassandra client request write failure (Criteo)
A lot of write failures encountered. A write failure is a non-timeout exception encountered during a write request. Examine the reason map to find to the root cause. The most common cause for this type of error is when batch sizes are too large.
- alert: CassandraClientRequestWriteFailure(Criteo)
expr: cassandra_stats{name="org:apache:cassandra:metrics:clientrequest:write:failures:oneminuterate"} > 0.05
for: 0m
labels:
severity: critical
annotations:
summary: Cassandra client request write failure (Criteo) (instance {{ $labels.instance }})
description: "A lot of write failures encountered. A write failure is a non-timeout exception encountered during a write request. Examine the reason map to find to the root cause. The most common cause for this type of error is when batch sizes are too large.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" critical
2.13.2.17. Cassandra client request read failure (Criteo)
A lot of read failures encountered. A read failure is a non-timeout exception encountered during a read request. Examine the reason map to find to the root cause. The most common cause for this type of error is when batch sizes are too large.
- alert: CassandraClientRequestReadFailure(Criteo)
expr: cassandra_stats{name="org:apache:cassandra:metrics:clientrequest:read:failures:oneminuterate"} > 0.05
for: 0m
labels:
severity: critical
annotations:
summary: Cassandra client request read failure (Criteo) (instance {{ $labels.instance }})
description: "A lot of read failures encountered. A read failure is a non-timeout exception encountered during a read request. Examine the reason map to find to the root cause. The most common cause for this type of error is when batch sizes are too large.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" warning
2.13.2.18. Cassandra cache hit rate key cache
Key cache hit rate is below 85%
# A low key cache hit rate increases disk I/O. Threshold is workload-dependent — adjust based on your data access patterns.
- alert: CassandraCacheHitRateKeyCache
expr: cassandra_stats{name="org:apache:cassandra:metrics:cache:keycache:hitrate:value"} < .85
for: 2m
labels:
severity: warning
annotations:
summary: Cassandra cache hit rate key cache (instance {{ $labels.instance }})
description: "Key cache hit rate is below 85%\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"