Skip to main content
APA
Sponsored by CAST AI โ€” Kubernetes cost optimization Better Stack โ€” Uptime monitoring and log management
โš ๏ธ

Alert thresholds depend on the nature of your applications. Some queries may have arbitrary tolerance thresholds. Building an efficient monitoring platform takes time. ๐Ÿ˜‰

Grafana Mimir Prometheus Alert Rules

49 Prometheus alerting rules for Grafana Mimir. Exported via Embedded exporter. These rules cover critical and warning conditions โ€” copy and paste the YAML into your Prometheus configuration.

12.6. Embedded exporter (49 rules)

Mimir uses the `cortex_` metric prefix for backward compatibility with Cortex. This is intentional and expected.
wget https://raw.githubusercontent.com/samber/awesome-prometheus-alerts/refs/heads/master/dist/rules/grafana-mimir/embedded-exporter.yml
critical

12.6.1. Mimir ingester unhealthy

Mimir has {{ $value }} unhealthy ingester(s) in the ring.

- alert: MimirIngesterUnhealthy
  expr: min by (job) (cortex_ring_members{state="Unhealthy", name="ingester"}) > 0
  for: 15m
  labels:
    severity: critical
  annotations:
    summary: Mimir ingester unhealthy (instance {{ $labels.instance }})
    description: "Mimir has {{ $value }} unhealthy ingester(s) in the ring.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

12.6.2. Mimir request errors

Mimir {{ $labels.job }} {{ $labels.route }} is experiencing {{ printf "%.2f" $value }}% errors.

- alert: MimirRequestErrors
  expr: 100 * sum by (job, route) (rate(cortex_request_duration_seconds_count{status_code=~"5..", route!~"ready|debug_pprof"}[5m])) / sum by (job, route) (rate(cortex_request_duration_seconds_count{route!~"ready|debug_pprof"}[5m])) > 1 and sum by (job, route) (rate(cortex_request_duration_seconds_count{route!~"ready|debug_pprof"}[5m])) > 0
  for: 15m
  labels:
    severity: critical
  annotations:
    summary: Mimir request errors (instance {{ $labels.instance }})
    description: "Mimir {{ $labels.job }} {{ $labels.route }} is experiencing {{ printf \"%.2f\" $value }}% errors.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

12.6.3. Mimir inconsistent runtime config

An inconsistent runtime config file is used across Mimir instances.

- alert: MimirInconsistentRuntimeConfig
  expr: count(count by (job, sha256) (cortex_runtime_config_hash)) without(sha256) > 1
  for: 1h
  labels:
    severity: critical
  annotations:
    summary: Mimir inconsistent runtime config (instance {{ $labels.instance }})
    description: "An inconsistent runtime config file is used across Mimir instances.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

12.6.4. Mimir bad runtime config

{{ $labels.job }} failed to reload runtime config.

- alert: MimirBadRuntimeConfig
  expr: sum by (job) (cortex_runtime_config_last_reload_successful == 0) > 0
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: Mimir bad runtime config (instance {{ $labels.instance }})
    description: "{{ $labels.job }} failed to reload runtime config.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

12.6.5. Mimir scheduler queries stuck

There are {{ $value }} queued up queries in {{ $labels.job }}.

- alert: MimirSchedulerQueriesStuck
  expr: sum by (job) (min_over_time(cortex_query_scheduler_queue_length[1m])) > 0
  for: 7m
  labels:
    severity: critical
  annotations:
    summary: Mimir scheduler queries stuck (instance {{ $labels.instance }})
    description: "There are {{ $value }} queued up queries in {{ $labels.job }}.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

12.6.6. Mimir cache request errors

Mimir cache {{ $labels.name }} is experiencing {{ printf "%.2f" $value }}% errors for {{ $labels.operation }} operation.

- alert: MimirCacheRequestErrors
  expr: (sum by (name, operation, job) (rate(thanos_cache_operation_failures_total[5m])) / sum by (name, operation, job) (rate(thanos_cache_operations_total[5m]))) * 100 > 5 and sum by (name, operation, job) (rate(thanos_cache_operations_total[5m])) > 0
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: Mimir cache request errors (instance {{ $labels.instance }})
    description: "Mimir cache {{ $labels.name }} is experiencing {{ printf \"%.2f\" $value }}% errors for {{ $labels.operation }} operation.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

12.6.7. Mimir KV store failure

Mimir {{ $labels.job }} KV store {{ $labels.kv_name }} is failing with 100% error rate.

- alert: MimirKVStoreFailure
  expr: (sum by (job, kv_name) (rate(cortex_kv_request_duration_seconds_count{status_code!~"2.."}[5m])) / sum by (job, kv_name) (rate(cortex_kv_request_duration_seconds_count[5m]))) == 1 and sum by (job, kv_name) (rate(cortex_kv_request_duration_seconds_count[5m])) > 0
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: Mimir KV store failure (instance {{ $labels.instance }})
    description: "Mimir {{ $labels.job }} KV store {{ $labels.kv_name }} is failing with 100% error rate.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

12.6.8. Mimir memory map areas too high

Mimir {{ $labels.job }} is using {{ printf "%.0f" $value }}% of its memory map area limit.

- alert: MimirMemoryMapAreasTooHigh
  expr: process_memory_map_areas{job=~".*(ingester|cortex|mimir|store-gateway).*"} / process_memory_map_areas_limit{job=~".*(ingester|cortex|mimir|store-gateway).*"} * 100 > 80 and process_memory_map_areas_limit{job=~".*(ingester|cortex|mimir|store-gateway).*"} > 0
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: Mimir memory map areas too high (instance {{ $labels.instance }})
    description: "Mimir {{ $labels.job }} is using {{ printf \"%.0f\" $value }}% of its memory map area limit.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

12.6.9. Mimir ingester instance has no tenants

Mimir ingester {{ $labels.instance }} has no tenants assigned.

- alert: MimirIngesterInstanceHasNoTenants
  expr: (cortex_ingester_memory_users == 0) and on (instance) (cortex_ingester_memory_users offset 1h > 0)
  for: 1h
  labels:
    severity: warning
  annotations:
    summary: Mimir ingester instance has no tenants (instance {{ $labels.instance }})
    description: "Mimir ingester {{ $labels.instance }} has no tenants assigned.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

12.6.10. Mimir ruler instance has no rule groups

Mimir ruler {{ $labels.instance }} has no rule groups assigned.

- alert: MimirRulerInstanceHasNoRuleGroups
  expr: (cortex_ruler_managers_total == 0) and on (instance) (cortex_ruler_managers_total offset 1h > 0)
  for: 1h
  labels:
    severity: warning
  annotations:
    summary: Mimir ruler instance has no rule groups (instance {{ $labels.instance }})
    description: "Mimir ruler {{ $labels.instance }} has no rule groups assigned.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

12.6.11. Mimir ingested data too far in the future

Mimir ingester {{ $labels.job }} has ingested samples with timestamps more than 1 hour in the future.

- alert: MimirIngestedDataTooFarInTheFuture
  expr: max by (job) (cortex_ingester_tsdb_head_max_timestamp_seconds - time() and cortex_ingester_tsdb_head_max_timestamp_seconds > 0) > 3600
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: Mimir ingested data too far in the future (instance {{ $labels.instance }})
    description: "Mimir ingester {{ $labels.job }} has ingested samples with timestamps more than 1 hour in the future.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

12.6.12. Mimir store gateway too many failed operations

Mimir store-gateway {{ $labels.job }} bucket operations are failing ({{ $value | humanize }}/s).

  # Threshold of 0.05/s avoids firing on transient single-event spikes.
- alert: MimirStoreGatewayTooManyFailedOperations
  expr: sum by (job) (rate(thanos_objstore_bucket_operation_failures_total[5m])) > 0.05
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: Mimir store gateway too many failed operations (instance {{ $labels.instance }})
    description: "Mimir store-gateway {{ $labels.job }} bucket operations are failing ({{ $value | humanize }}/s).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

12.6.13. Mimir ring members mismatch

Mimir {{ $labels.name }} ring has inconsistent member counts across instances.

- alert: MimirRingMembersMismatch
  expr: max by (name, job) (sum by (name, job, instance) (cortex_ring_members)) != min by (name, job) (sum by (name, job, instance) (cortex_ring_members))
  for: 15m
  labels:
    severity: warning
  annotations:
    summary: Mimir ring members mismatch (instance {{ $labels.instance }})
    description: "Mimir {{ $labels.name }} ring has inconsistent member counts across instances.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

12.6.14. Mimir ingester reaching series limit warning

Mimir ingester {{ $labels.instance }} has reached {{ printf "%.0f" $value }}% of its series limit.

- alert: MimirIngesterReachingSeriesLimitWarning
  expr: (cortex_ingester_memory_series / ignoring(limit) cortex_ingester_instance_limits{limit="max_series"} * 100 > 80) and cortex_ingester_instance_limits{limit="max_series"} > 0
  for: 3h
  labels:
    severity: warning
  annotations:
    summary: Mimir ingester reaching series limit warning (instance {{ $labels.instance }})
    description: "Mimir ingester {{ $labels.instance }} has reached {{ printf \"%.0f\" $value }}% of its series limit.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

12.6.15. Mimir ingester reaching series limit critical

Mimir ingester {{ $labels.instance }} has reached {{ printf "%.0f" $value }}% of its series limit.

- alert: MimirIngesterReachingSeriesLimitCritical
  expr: (cortex_ingester_memory_series / ignoring(limit) cortex_ingester_instance_limits{limit="max_series"} * 100 > 90) and cortex_ingester_instance_limits{limit="max_series"} > 0
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: Mimir ingester reaching series limit critical (instance {{ $labels.instance }})
    description: "Mimir ingester {{ $labels.instance }} has reached {{ printf \"%.0f\" $value }}% of its series limit.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

12.6.16. Mimir ingester reaching tenants limit warning

Mimir ingester {{ $labels.instance }} has reached {{ printf "%.0f" $value }}% of its tenants limit.

- alert: MimirIngesterReachingTenantsLimitWarning
  expr: (cortex_ingester_memory_users / ignoring(limit) cortex_ingester_instance_limits{limit="max_tenants"} * 100 > 70) and cortex_ingester_instance_limits{limit="max_tenants"} > 0
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: Mimir ingester reaching tenants limit warning (instance {{ $labels.instance }})
    description: "Mimir ingester {{ $labels.instance }} has reached {{ printf \"%.0f\" $value }}% of its tenants limit.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

12.6.17. Mimir ingester reaching tenants limit critical

Mimir ingester {{ $labels.instance }} has reached {{ printf "%.0f" $value }}% of its tenants limit.

- alert: MimirIngesterReachingTenantsLimitCritical
  expr: (cortex_ingester_memory_users / ignoring(limit) cortex_ingester_instance_limits{limit="max_tenants"} * 100 > 80) and cortex_ingester_instance_limits{limit="max_tenants"} > 0
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: Mimir ingester reaching tenants limit critical (instance {{ $labels.instance }})
    description: "Mimir ingester {{ $labels.instance }} has reached {{ printf \"%.0f\" $value }}% of its tenants limit.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

12.6.18. Mimir reaching TCP connections limit

Mimir instance {{ $labels.instance }} is using {{ printf "%.0f" $value }}% of its TCP connections limit.

- alert: MimirReachingTCPConnectionsLimit
  expr: cortex_tcp_connections / cortex_tcp_connections_limit * 100 > 80 and cortex_tcp_connections_limit > 0
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: Mimir reaching TCP connections limit (instance {{ $labels.instance }})
    description: "Mimir instance {{ $labels.instance }} is using {{ printf \"%.0f\" $value }}% of its TCP connections limit.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

12.6.19. Mimir distributor inflight requests high

Mimir distributor {{ $labels.instance }} is using {{ printf "%.0f" $value }}% of its inflight push requests limit.

- alert: MimirDistributorInflightRequestsHigh
  expr: (cortex_distributor_inflight_push_requests / ignoring(limit) cortex_distributor_instance_limits{limit="max_inflight_push_requests"} * 100 > 80) and cortex_distributor_instance_limits{limit="max_inflight_push_requests"} > 0
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: Mimir distributor inflight requests high (instance {{ $labels.instance }})
    description: "Mimir distributor {{ $labels.instance }} is using {{ printf \"%.0f\" $value }}% of its inflight push requests limit.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

12.6.20. Mimir ingester TSDB head compaction failed

Mimir ingester {{ $labels.instance }} is failing to compact TSDB head ({{ $value | humanize }}/s).

  # Threshold of 0.05/s avoids firing on transient single-event spikes.
- alert: MimirIngesterTSDBHeadCompactionFailed
  expr: rate(cortex_ingester_tsdb_compactions_failed_total[5m]) > 0.05
  for: 15m
  labels:
    severity: critical
  annotations:
    summary: Mimir ingester TSDB head compaction failed (instance {{ $labels.instance }})
    description: "Mimir ingester {{ $labels.instance }} is failing to compact TSDB head ({{ $value | humanize }}/s).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

12.6.21. Mimir ingester TSDB head truncation failed

Mimir ingester {{ $labels.instance }} is failing to truncate TSDB head ({{ $value | humanize }}/s).

  # Threshold of 0.05/s avoids firing on transient single-event spikes.
- alert: MimirIngesterTSDBHeadTruncationFailed
  expr: rate(cortex_ingester_tsdb_head_truncations_failed_total[5m]) > 0.05
  for: 15m
  labels:
    severity: critical
  annotations:
    summary: Mimir ingester TSDB head truncation failed (instance {{ $labels.instance }})
    description: "Mimir ingester {{ $labels.instance }} is failing to truncate TSDB head ({{ $value | humanize }}/s).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

12.6.22. Mimir ingester TSDB checkpoint creation failed

Mimir ingester {{ $labels.instance }} is failing to create TSDB checkpoints ({{ $value | humanize }}/s).

  # Threshold of 0.05/s avoids firing on transient single-event spikes.
- alert: MimirIngesterTSDBCheckpointCreationFailed
  expr: rate(cortex_ingester_tsdb_checkpoint_creations_failed_total[5m]) > 0.05
  for: 15m
  labels:
    severity: critical
  annotations:
    summary: Mimir ingester TSDB checkpoint creation failed (instance {{ $labels.instance }})
    description: "Mimir ingester {{ $labels.instance }} is failing to create TSDB checkpoints ({{ $value | humanize }}/s).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

12.6.23. Mimir ingester TSDB checkpoint deletion failed

Mimir ingester {{ $labels.instance }} is failing to delete TSDB checkpoints ({{ $value | humanize }}/s).

  # Threshold of 0.05/s avoids firing on transient single-event spikes.
- alert: MimirIngesterTSDBCheckpointDeletionFailed
  expr: rate(cortex_ingester_tsdb_checkpoint_deletions_failed_total[5m]) > 0.05
  for: 0m
  labels:
    severity: critical
  annotations:
    summary: Mimir ingester TSDB checkpoint deletion failed (instance {{ $labels.instance }})
    description: "Mimir ingester {{ $labels.instance }} is failing to delete TSDB checkpoints ({{ $value | humanize }}/s).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

12.6.24. Mimir ingester TSDB WAL truncation failed

Mimir ingester {{ $labels.instance }} is failing to truncate TSDB WAL ({{ $value | humanize }}/s).

  # Threshold of 0.05/s avoids firing on transient single-event spikes.
- alert: MimirIngesterTSDBWALTruncationFailed
  expr: rate(cortex_ingester_tsdb_wal_truncations_failed_total[5m]) > 0.05
  for: 0m
  labels:
    severity: warning
  annotations:
    summary: Mimir ingester TSDB WAL truncation failed (instance {{ $labels.instance }})
    description: "Mimir ingester {{ $labels.instance }} is failing to truncate TSDB WAL ({{ $value | humanize }}/s).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

12.6.25. Mimir ingester TSDB WAL writes failed

Mimir ingester {{ $labels.instance }} is failing to write to TSDB WAL ({{ $value | humanize }}/s).

  # Threshold of 0.05/s avoids firing on transient single-event spikes.
- alert: MimirIngesterTSDBWALWritesFailed
  expr: rate(cortex_ingester_tsdb_wal_writes_failed_total[1m]) > 0.05
  for: 3m
  labels:
    severity: critical
  annotations:
    summary: Mimir ingester TSDB WAL writes failed (instance {{ $labels.instance }})
    description: "Mimir ingester {{ $labels.instance }} is failing to write to TSDB WAL ({{ $value | humanize }}/s).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

12.6.26. Mimir store gateway has not synced bucket

Mimir store-gateway {{ $labels.instance }} has not synced the bucket for more than 30 minutes.

  # Threshold of 30 minutes. Adjust based on your sync interval.
- alert: MimirStoreGatewayHasNotSyncedBucket
  expr: (time() - cortex_bucket_stores_blocks_last_successful_sync_timestamp_seconds{component="store-gateway"} > 1800) and cortex_bucket_stores_blocks_last_successful_sync_timestamp_seconds{component="store-gateway"} > 0
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: Mimir store gateway has not synced bucket (instance {{ $labels.instance }})
    description: "Mimir store-gateway {{ $labels.instance }} has not synced the bucket for more than 30 minutes.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

12.6.27. Mimir store gateway no synced tenants

Mimir store-gateway {{ $labels.instance }} has no synced tenants.

- alert: MimirStoreGatewayNoSyncedTenants
  expr: (min by (instance, job) (cortex_bucket_stores_tenants_synced{component="store-gateway"}) == 0) and on (instance) (cortex_bucket_stores_tenants_synced{component="store-gateway"} offset 1h > 0)
  for: 1h
  labels:
    severity: warning
  annotations:
    summary: Mimir store gateway no synced tenants (instance {{ $labels.instance }})
    description: "Mimir store-gateway {{ $labels.instance }} has no synced tenants.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

12.6.28. Mimir bucket index not updated

Mimir bucket index for tenant {{ $labels.user }} has not been updated for more than 35 minutes.

- alert: MimirBucketIndexNotUpdated
  expr: min by (user, job) (time() - cortex_bucket_index_last_successful_update_timestamp_seconds) > 2100
  for: 0m
  labels:
    severity: critical
  annotations:
    summary: Mimir bucket index not updated (instance {{ $labels.instance }})
    description: "Mimir bucket index for tenant {{ $labels.user }} has not been updated for more than 35 minutes.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

12.6.29. Mimir compactor not cleaning up blocks

Mimir compactor {{ $labels.instance }} has not cleaned up blocks in the last 6 hours.

- alert: MimirCompactorNotCleaningUpBlocks
  expr: (time() - cortex_compactor_block_cleanup_last_successful_run_timestamp_seconds > 21600) and cortex_compactor_block_cleanup_last_successful_run_timestamp_seconds > 0
  for: 1h
  labels:
    severity: critical
  annotations:
    summary: Mimir compactor not cleaning up blocks (instance {{ $labels.instance }})
    description: "Mimir compactor {{ $labels.instance }} has not cleaned up blocks in the last 6 hours.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

12.6.30. Mimir compactor not running compaction

Mimir compactor {{ $labels.instance }} has not run compaction in the last 24 hours.

- alert: MimirCompactorNotRunningCompaction
  expr: (time() - cortex_compactor_last_successful_run_timestamp_seconds > 86400) and cortex_compactor_last_successful_run_timestamp_seconds > 0
  for: 15m
  labels:
    severity: critical
  annotations:
    summary: Mimir compactor not running compaction (instance {{ $labels.instance }})
    description: "Mimir compactor {{ $labels.instance }} has not run compaction in the last 24 hours.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

12.6.31. Mimir compactor has consecutive failures

Mimir compactor {{ $labels.instance }} has had {{ $value }} compaction failures in the last 2 hours.

- alert: MimirCompactorHasConsecutiveFailures
  expr: increase(cortex_compactor_runs_failed_total{reason!="shutdown"}[2h]) > 1
  for: 0m
  labels:
    severity: critical
  annotations:
    summary: Mimir compactor has consecutive failures (instance {{ $labels.instance }})
    description: "Mimir compactor {{ $labels.instance }} has had {{ $value }} compaction failures in the last 2 hours.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

12.6.32. Mimir compactor has run out of disk space

Mimir compactor {{ $labels.instance }} has run out of disk space.

  # cortex_compactor_disk_out_of_space_errors_total is declared as gauge by Mimir despite the _total suffix, so delta() is used instead of increase().
- alert: MimirCompactorHasRunOutOfDiskSpace
  expr: delta(cortex_compactor_disk_out_of_space_errors_total[24h]) >= 1
  for: 0m
  labels:
    severity: critical
  annotations:
    summary: Mimir compactor has run out of disk space (instance {{ $labels.instance }})
    description: "Mimir compactor {{ $labels.instance }} has run out of disk space.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

12.6.33. Mimir compactor has not uploaded blocks

Mimir compactor {{ $labels.instance }} has not uploaded any block in the last 24 hours.

- alert: MimirCompactorHasNotUploadedBlocks
  expr: (time() - thanos_objstore_bucket_last_successful_upload_time{component="compactor"} > 86400) and thanos_objstore_bucket_last_successful_upload_time{component="compactor"} > 0
  for: 15m
  labels:
    severity: critical
  annotations:
    summary: Mimir compactor has not uploaded blocks (instance {{ $labels.instance }})
    description: "Mimir compactor {{ $labels.instance }} has not uploaded any block in the last 24 hours.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

12.6.34. Mimir compactor skipped blocks

Mimir compactor has found {{ $value }} blocks that cannot be compacted (reason {{ $labels.reason }}).

  # Using a 24h window as compaction skips are rare events.
- alert: MimirCompactorSkippedBlocks
  expr: increase(cortex_compactor_blocks_marked_for_no_compaction_total[24h]) > 0
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: Mimir compactor skipped blocks (instance {{ $labels.instance }})
    description: "Mimir compactor has found {{ $value }} blocks that cannot be compacted (reason {{ $labels.reason }}).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

12.6.35. Mimir ruler too many failed pushes

Mimir ruler {{ $labels.instance }} is failing to push {{ printf "%.2f" $value }}% of write requests.

- alert: MimirRulerTooManyFailedPushes
  expr: 100 * sum by (instance, job) (rate(cortex_ruler_write_requests_failed_total[5m])) / sum by (instance, job) (rate(cortex_ruler_write_requests_total[5m])) > 1 and sum by (instance, job) (rate(cortex_ruler_write_requests_total[5m])) > 0
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: Mimir ruler too many failed pushes (instance {{ $labels.instance }})
    description: "Mimir ruler {{ $labels.instance }} is failing to push {{ printf \"%.2f\" $value }}% of write requests.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

12.6.36. Mimir ruler too many failed queries

Mimir ruler {{ $labels.instance }} is failing {{ printf "%.2f" $value }}% of query evaluations.

- alert: MimirRulerTooManyFailedQueries
  expr: 100 * sum by (instance, job) (rate(cortex_ruler_queries_failed_total[5m])) / sum by (instance, job) (rate(cortex_ruler_queries_total[5m])) > 1 and sum by (instance, job) (rate(cortex_ruler_queries_total[5m])) > 0
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: Mimir ruler too many failed queries (instance {{ $labels.instance }})
    description: "Mimir ruler {{ $labels.instance }} is failing {{ printf \"%.2f\" $value }}% of query evaluations.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

12.6.37. Mimir ruler missed evaluations

Mimir ruler {{ $labels.instance }} is missing {{ printf "%.2f" $value }}% of rule group evaluations.

- alert: MimirRulerMissedEvaluations
  expr: 100 * sum by (instance, job) (rate(cortex_prometheus_rule_group_iterations_missed_total[5m])) / sum by (instance, job) (rate(cortex_prometheus_rule_group_iterations_total[5m])) > 1 and sum by (instance, job) (rate(cortex_prometheus_rule_group_iterations_total[5m])) > 0
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: Mimir ruler missed evaluations (instance {{ $labels.instance }})
    description: "Mimir ruler {{ $labels.instance }} is missing {{ printf \"%.2f\" $value }}% of rule group evaluations.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

12.6.38. Mimir ruler failed ring check

Mimir ruler {{ $labels.job }} is failing ring checks ({{ $value | humanize }}/s).

  # Threshold of 0.05/s avoids firing on transient single-event spikes.
- alert: MimirRulerFailedRingCheck
  expr: sum by (job) (rate(cortex_ruler_ring_check_errors_total[5m])) > 0.05
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: Mimir ruler failed ring check (instance {{ $labels.instance }})
    description: "Mimir ruler {{ $labels.job }} is failing ring checks ({{ $value | humanize }}/s).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

12.6.39. Mimir alertmanager sync configs failing

Mimir alertmanager {{ $labels.job }} is failing to sync configs ({{ $value | humanize }}/s).

  # Threshold of 0.05/s avoids firing on transient single-event spikes.
- alert: MimirAlertmanagerSyncConfigsFailing
  expr: rate(cortex_alertmanager_sync_configs_failed_total[5m]) > 0.05
  for: 30m
  labels:
    severity: critical
  annotations:
    summary: Mimir alertmanager sync configs failing (instance {{ $labels.instance }})
    description: "Mimir alertmanager {{ $labels.job }} is failing to sync configs ({{ $value | humanize }}/s).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

12.6.40. Mimir alertmanager ring check failing

Mimir alertmanager {{ $labels.job }} is failing ring checks ({{ $value | humanize }}/s).

  # Threshold of 0.05/s avoids firing on transient single-event spikes.
- alert: MimirAlertmanagerRingCheckFailing
  expr: rate(cortex_alertmanager_ring_check_errors_total[5m]) > 0.05
  for: 10m
  labels:
    severity: critical
  annotations:
    summary: Mimir alertmanager ring check failing (instance {{ $labels.instance }})
    description: "Mimir alertmanager {{ $labels.job }} is failing ring checks ({{ $value | humanize }}/s).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

12.6.41. Mimir alertmanager state merge failing

Mimir alertmanager {{ $labels.job }} is failing to merge state updates ({{ $value | humanize }}/s).

  # Threshold of 0.05/s avoids firing on transient single-event spikes.
- alert: MimirAlertmanagerStateMergeFailing
  expr: rate(cortex_alertmanager_partial_state_merges_failed_total[5m]) > 0.05
  for: 10m
  labels:
    severity: critical
  annotations:
    summary: Mimir alertmanager state merge failing (instance {{ $labels.instance }})
    description: "Mimir alertmanager {{ $labels.job }} is failing to merge state updates ({{ $value | humanize }}/s).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

12.6.42. Mimir alertmanager replication failing

Mimir alertmanager {{ $labels.job }} is failing to replicate state ({{ $value | humanize }}/s).

  # Threshold of 0.05/s avoids firing on transient single-event spikes.
- alert: MimirAlertmanagerReplicationFailing
  expr: rate(cortex_alertmanager_state_replication_failed_total[5m]) > 0.05
  for: 10m
  labels:
    severity: critical
  annotations:
    summary: Mimir alertmanager replication failing (instance {{ $labels.instance }})
    description: "Mimir alertmanager {{ $labels.job }} is failing to replicate state ({{ $value | humanize }}/s).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

12.6.43. Mimir alertmanager persist state failing

Mimir alertmanager {{ $labels.job }} is failing to persist state ({{ $value | humanize }}/s).

  # Threshold of 0.05/s avoids firing on transient single-event spikes.
- alert: MimirAlertmanagerPersistStateFailing
  expr: rate(cortex_alertmanager_state_persist_failed_total[15m]) > 0.05
  for: 1h
  labels:
    severity: critical
  annotations:
    summary: Mimir alertmanager persist state failing (instance {{ $labels.instance }})
    description: "Mimir alertmanager {{ $labels.job }} is failing to persist state ({{ $value | humanize }}/s).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

12.6.44. Mimir alertmanager initial sync failed

Mimir alertmanager {{ $labels.job }} failed initial state sync.

- alert: MimirAlertmanagerInitialSyncFailed
  expr: increase(cortex_alertmanager_state_initial_sync_completed_total{outcome="failed"}[1m]) > 0
  for: 0m
  labels:
    severity: warning
  annotations:
    summary: Mimir alertmanager initial sync failed (instance {{ $labels.instance }})
    description: "Mimir alertmanager {{ $labels.job }} failed initial state sync.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

12.6.45. Mimir alertmanager instance has no tenants

Mimir alertmanager {{ $labels.instance }} has no tenants assigned.

- alert: MimirAlertmanagerInstanceHasNoTenants
  expr: (cortex_alertmanager_tenants_owned == 0) and on (instance) (cortex_alertmanager_tenants_owned offset 1h > 0)
  for: 1h
  labels:
    severity: warning
  annotations:
    summary: Mimir alertmanager instance has no tenants (instance {{ $labels.instance }})
    description: "Mimir alertmanager {{ $labels.instance }} has no tenants assigned.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

12.6.46. Mimir gossip members count too high

Mimir gossip cluster has more members than expected.

- alert: MimirGossipMembersCountTooHigh
  expr: avg(memberlist_client_cluster_members_count{job=~".*(mimir|cortex).*"}) by (job) * 1.15 + 10 < max(memberlist_client_cluster_members_count{job=~".*(mimir|cortex).*"}) by (job)
  for: 20m
  labels:
    severity: warning
  annotations:
    summary: Mimir gossip members count too high (instance {{ $labels.instance }})
    description: "Mimir gossip cluster has more members than expected.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

12.6.47. Mimir gossip members count too low

Mimir gossip cluster has fewer members than expected.

- alert: MimirGossipMembersCountTooLow
  expr: avg(memberlist_client_cluster_members_count{job=~".*(mimir|cortex).*"}) by (job) * 0.5 > min(memberlist_client_cluster_members_count{job=~".*(mimir|cortex).*"}) by (job)
  for: 20m
  labels:
    severity: warning
  annotations:
    summary: Mimir gossip members count too low (instance {{ $labels.instance }})
    description: "Mimir gossip cluster has fewer members than expected.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

12.6.48. Mimir go threads too high warning

Mimir {{ $labels.instance }} has {{ $value }} Go threads.

  # A high number of Go threads may indicate a goroutine leak.
- alert: MimirGoThreadsTooHighWarning
  expr: go_threads{job=~".*(mimir|cortex).*"} > 5000
  for: 15m
  labels:
    severity: warning
  annotations:
    summary: Mimir go threads too high warning (instance {{ $labels.instance }})
    description: "Mimir {{ $labels.instance }} has {{ $value }} Go threads.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

12.6.49. Mimir go threads too high critical

Mimir {{ $labels.instance }} has {{ $value }} Go threads.

- alert: MimirGoThreadsTooHighCritical
  expr: go_threads{job=~".*(mimir|cortex).*"} > 8000
  for: 15m
  labels:
    severity: critical
  annotations:
    summary: Mimir go threads too high critical (instance {{ $labels.instance }})
    description: "Mimir {{ $labels.instance }} has {{ $value }} Go threads.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"