Skip to main content
APA
Sponsored by CAST AI — Kubernetes cost optimization Better Stack — Uptime monitoring and log management
⚠️

Alert thresholds depend on the nature of your applications. Some queries may have arbitrary tolerance thresholds. Building an efficient monitoring platform takes time. 😉

Systemd Prometheus Alert Rules

7 Prometheus alerting rules for Systemd. Exported via prometheus-community/systemd_exporter. These rules cover critical and warning conditions — copy and paste the YAML into your Prometheus configuration.

wget https://raw.githubusercontent.com/samber/awesome-prometheus-alerts/refs/heads/master/dist/rules/systemd/systemd-exporter.yml
warning

1.13.1. Systemd unit failed

Systemd unit {{ $labels.name }} has entered failed state. (instance {{ $labels.instance }})

- alert: SystemdUnitFailed
  expr: systemd_unit_state{state="failed"} == 1
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: Systemd unit failed (instance {{ $labels.instance }})
    description: "Systemd unit {{ $labels.name }} has entered failed state. (instance {{ $labels.instance }})\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

1.13.2. Systemd unit inactive

Systemd unit {{ $labels.name }} is inactive. (instance {{ $labels.instance }})

  # Many units are legitimately inactive. You must adjust the name=~ filter to match your critical services.
- alert: SystemdUnitInactive
  expr: systemd_unit_state{state="inactive", type="service", name=~"your-critical-service.+"} == 1
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: Systemd unit inactive (instance {{ $labels.instance }})
    description: "Systemd unit {{ $labels.name }} is inactive. (instance {{ $labels.instance }})\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

1.13.3. Systemd service crash looping

Systemd service {{ $labels.name }} has restarted {{ $value }} times in the last hour. (instance {{ $labels.instance }})

- alert: SystemdServiceCrashLooping
  expr: increase(systemd_service_restart_total[1h]) > 5
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: Systemd service crash looping (instance {{ $labels.instance }})
    description: "Systemd service {{ $labels.name }} has restarted {{ $value }} times in the last hour. (instance {{ $labels.instance }})\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

1.13.4. Systemd unit tasks near limit

Systemd unit {{ $labels.name }} is using {{ $value | humanizePercentage }} of its task limit. (instance {{ $labels.instance }})

- alert: SystemdUnitTasksNearLimit
  expr: systemd_unit_tasks_current / ignoring(type) systemd_unit_tasks_max > 0.9 and ignoring(type) systemd_unit_tasks_max > 0
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: Systemd unit tasks near limit (instance {{ $labels.instance }})
    description: "Systemd unit {{ $labels.name }} is using {{ $value | humanizePercentage }} of its task limit. (instance {{ $labels.instance }})\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

1.13.5. Systemd socket refused connections

Systemd socket {{ $labels.name }} is refusing connections. ({{ $value }} refused in last 5m, instance {{ $labels.instance }})

  # systemd_socket_refused_connections_total is declared as Gauge by the exporter despite the _total suffix, so delta() is used instead of increase().
- alert: SystemdSocketRefusedConnections
  expr: delta(systemd_socket_refused_connections_total[5m]) > 3
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: Systemd socket refused connections (instance {{ $labels.instance }})
    description: "Systemd socket {{ $labels.name }} is refusing connections. ({{ $value }} refused in last 5m, instance {{ $labels.instance }})\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

1.13.6. Systemd socket high connections

Systemd socket {{ $labels.name }} has {{ $value }} active connections. (instance {{ $labels.instance }})

  # Threshold of 100 connections is arbitrary. Adjust to your workload.
- alert: SystemdSocketHighConnections
  expr: systemd_socket_current_connections > 100
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: Systemd socket high connections (instance {{ $labels.instance }})
    description: "Systemd socket {{ $labels.name }} has {{ $value }} active connections. (instance {{ $labels.instance }})\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

1.13.7. Systemd timer missed trigger

Systemd timer {{ $labels.name }} has not triggered for over 24 hours. (instance {{ $labels.instance }})

  # Triggers if timer hasn't fired in 24 hours. Adjust threshold per timer schedule.
- alert: SystemdTimerMissedTrigger
  expr: (time() - systemd_timer_last_trigger_seconds) / 3600 > 24 and systemd_timer_last_trigger_seconds > 0
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: Systemd timer missed trigger (instance {{ $labels.instance }})
    description: "Systemd timer {{ $labels.name }} has not triggered for over 24 hours. (instance {{ $labels.instance }})\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"