What is the Prometheus alert rule for "Cortex ruler configuration reload failure"?

Cortex ruler configuration reload failure (instance {{ $labels.instance }}) PromQL expression: cortex_ruler_config_last_reload_successful != 1. Severity: warning.

What is the Prometheus alert rule for "Cortex not connected to Alertmanager"?

Cortex not connected to Alertmanager (instance {{ $labels.instance }}) PromQL expression: cortex_prometheus_notifications_alertmanagers_discovered < 1. Severity: critical.

What is the Prometheus alert rule for "Cortex notifications are being dropped"?

Cortex notifications are being dropped due to errors (instance {{ $labels.instance }}, {{ $value | humanize }}/s). PromQL expression: rate(cortex_prometheus_notifications_dropped_total[5m]) > 0.05. Severity: critical.

What is the Prometheus alert rule for "Cortex notification errors"?

Cortex is failing when sending alert notifications (instance {{ $labels.instance }}, {{ $value | humanize }}/s). PromQL expression: rate(cortex_prometheus_notifications_errors_total[5m]) > 0.05. Severity: critical.

What is the Prometheus alert rule for "Cortex ingester unhealthy"?

Cortex has an unhealthy ingester PromQL expression: cortex_ring_members{state="Unhealthy", name="ingester"} > 0. Severity: critical.

What is the Prometheus alert rule for "Cortex frontend queries stuck"?

There are queued up queries in query-frontend. PromQL expression: sum by (job) (cortex_query_frontend_queue_length) > 0. Severity: critical. Duration: 5m.

Cortex Prometheus Alert Rules

6 Prometheus alerting rules for Cortex.Exported via Embedded exporter.These rules cover critical and warning conditions — copy and paste the YAML into your Prometheus configuration.

⚠️

Alert thresholds depend on the nature of your applications. Some queries may have arbitrary tolerance thresholds. Building an efficient monitoring platform takes time. 😉

groups:
- name: EmbeddedExporter
  rules:
    - alert: CortexRulerConfigurationReloadFailure
      expr: cortex_ruler_config_last_reload_successful != 1
      for: 0m
      labels:
        severity: warning
      annotations:
        summary: Cortex ruler configuration reload failure (instance {{ $labels.instance }})
        description: "Cortex ruler configuration reload failure (instance {{ $labels.instance }})\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    - alert: CortexNotConnectedToAlertmanager
      expr: cortex_prometheus_notifications_alertmanagers_discovered < 1
      for: 0m
      labels:
        severity: critical
      annotations:
        summary: Cortex not connected to Alertmanager (instance {{ $labels.instance }})
        description: "Cortex not connected to Alertmanager (instance {{ $labels.instance }})\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
      # Threshold of 0.05/s avoids firing on transient single-event spikes.
    - alert: CortexNotificationsAreBeingDropped
      expr: rate(cortex_prometheus_notifications_dropped_total[5m]) > 0.05
      for: 0m
      labels:
        severity: critical
      annotations:
        summary: Cortex notifications are being dropped (instance {{ $labels.instance }})
        description: "Cortex notifications are being dropped due to errors (instance {{ $labels.instance }}, {{ $value | humanize }}/s).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
      # Threshold of 0.05/s avoids firing on transient single-event spikes.
    - alert: CortexNotificationErrors
      expr: rate(cortex_prometheus_notifications_errors_total[5m]) > 0.05
      for: 0m
      labels:
        severity: critical
      annotations:
        summary: Cortex notification errors (instance {{ $labels.instance }})
        description: "Cortex is failing when sending alert notifications (instance {{ $labels.instance }}, {{ $value | humanize }}/s).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    - alert: CortexIngesterUnhealthy
      expr: cortex_ring_members{state="Unhealthy", name="ingester"} > 0
      for: 0m
      labels:
        severity: critical
      annotations:
        summary: Cortex ingester unhealthy (instance {{ $labels.instance }})
        description: "Cortex has an unhealthy ingester\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    - alert: CortexFrontendQueriesStuck
      expr: sum by (job) (cortex_query_frontend_queue_length) > 0
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: Cortex frontend queries stuck (instance {{ $labels.instance }})
        description: "There are queued up queries in query-frontend.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

12.4.Embedded exporter(6 rules)

wget https://raw.githubusercontent.com/samber/awesome-prometheus-alerts/refs/heads/master/dist/rules/cortex/embedded-exporter.yml

warning

12.4.1.Cortex ruler configuration reload failure

Cortex ruler configuration reload failure (instance {{ $labels.instance }})

- alert: CortexRulerConfigurationReloadFailure
  expr: cortex_ruler_config_last_reload_successful != 1
  for: 0m
  labels:
    severity: warning
  annotations:
    summary: Cortex ruler configuration reload failure (instance {{ $labels.instance }})
    description: "Cortex ruler configuration reload failure (instance {{ $labels.instance }})\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

critical

12.4.2.Cortex not connected to Alertmanager

Cortex not connected to Alertmanager (instance {{ $labels.instance }})

- alert: CortexNotConnectedToAlertmanager
  expr: cortex_prometheus_notifications_alertmanagers_discovered < 1
  for: 0m
  labels:
    severity: critical
  annotations:
    summary: Cortex not connected to Alertmanager (instance {{ $labels.instance }})
    description: "Cortex not connected to Alertmanager (instance {{ $labels.instance }})\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

critical

12.4.3.Cortex notifications are being dropped

Cortex notifications are being dropped due to errors (instance {{ $labels.instance }}, {{ $value | humanize }}/s).

  # Threshold of 0.05/s avoids firing on transient single-event spikes.
- alert: CortexNotificationsAreBeingDropped
  expr: rate(cortex_prometheus_notifications_dropped_total[5m]) > 0.05
  for: 0m
  labels:
    severity: critical
  annotations:
    summary: Cortex notifications are being dropped (instance {{ $labels.instance }})
    description: "Cortex notifications are being dropped due to errors (instance {{ $labels.instance }}, {{ $value | humanize }}/s).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

critical

12.4.4.Cortex notification errors

Cortex is failing when sending alert notifications (instance {{ $labels.instance }}, {{ $value | humanize }}/s).

  # Threshold of 0.05/s avoids firing on transient single-event spikes.
- alert: CortexNotificationErrors
  expr: rate(cortex_prometheus_notifications_errors_total[5m]) > 0.05
  for: 0m
  labels:
    severity: critical
  annotations:
    summary: Cortex notification errors (instance {{ $labels.instance }})
    description: "Cortex is failing when sending alert notifications (instance {{ $labels.instance }}, {{ $value | humanize }}/s).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

critical

12.4.5.Cortex ingester unhealthy

Cortex has an unhealthy ingester

- alert: CortexIngesterUnhealthy
  expr: cortex_ring_members{state="Unhealthy", name="ingester"} > 0
  for: 0m
  labels:
    severity: critical
  annotations:
    summary: Cortex ingester unhealthy (instance {{ $labels.instance }})
    description: "Cortex has an unhealthy ingester\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

critical

12.4.6.Cortex frontend queries stuck

There are queued up queries in query-frontend.

- alert: CortexFrontendQueriesStuck
  expr: sum by (job) (cortex_query_frontend_queue_length) > 0
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: Cortex frontend queries stuck (instance {{ $labels.instance }})
    description: "There are queued up queries in query-frontend.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

More in Observability

Thanos Loki Promtail Grafana Tempo Grafana Mimir Grafana Alloy OpenTelemetry Collector Jaeger