Skip to main content
APA
Sponsored by CAST AI — Kubernetes cost optimization Better Stack — Uptime monitoring and log management
⚠️

Alert thresholds depend on the nature of your applications. Some queries may have arbitrary tolerance thresholds. Building an efficient monitoring platform takes time. 😉

Proxmox VE Prometheus Alert Rules

9 Prometheus alerting rules for Proxmox VE. Exported via prometheus-pve/prometheus-pve-exporter. These rules cover critical and warning conditions — copy and paste the YAML into your Prometheus configuration.

wget https://raw.githubusercontent.com/samber/awesome-prometheus-alerts/refs/heads/master/dist/rules/proxmox-ve/prometheus-pve-exporter.yml
critical

1.9.1. PVE node down

Proxmox VE node {{ $labels.id }} is down.

- alert: PVENodeDown
  expr: pve_up{id=~"node/.*"} == 0
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: PVE node down (instance {{ $labels.instance }})
    description: "Proxmox VE node {{ $labels.id }} is down.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

1.9.2. PVE VM/CT down

Proxmox VE guest {{ $labels.id }} is not running.

  # This alert triggers for all VMs and containers that are not running.
  # You may want to filter by specific guests using the `id` label, or exclude
  # intentionally stopped guests with additional label matchers.
- alert: PVEVM/CTDown
  expr: pve_up{id=~"(qemu|lxc)/.*"} == 0
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: PVE VM/CT down (instance {{ $labels.instance }})
    description: "Proxmox VE guest {{ $labels.id }} is not running.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

1.9.3. PVE high CPU usage

Proxmox VE CPU usage is above 90% on {{ $labels.id }}. Current value: {{ $value | printf "%.2f" }}%

- alert: PVEHighCPUUsage
  expr: pve_cpu_usage_ratio * 100 > 90
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: PVE high CPU usage (instance {{ $labels.instance }})
    description: "Proxmox VE CPU usage is above 90% on {{ $labels.id }}. Current value: {{ $value | printf \"%.2f\" }}%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

1.9.4. PVE high memory usage

Proxmox VE memory usage is above 90% on {{ $labels.id }}. Current value: {{ $value | printf "%.2f" }}%

- alert: PVEHighMemoryUsage
  expr: pve_memory_usage_bytes / pve_memory_size_bytes * 100 > 90 and pve_memory_size_bytes > 0
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: PVE high memory usage (instance {{ $labels.instance }})
    description: "Proxmox VE memory usage is above 90% on {{ $labels.id }}. Current value: {{ $value | printf \"%.2f\" }}%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

1.9.5. PVE storage filling up

Proxmox VE storage {{ $labels.id }} is above 80% used. Current value: {{ $value | printf "%.2f" }}%

- alert: PVEStorageFillingUp
  expr: pve_disk_usage_bytes{id=~"storage/.*"} / pve_disk_size_bytes{id=~"storage/.*"} * 100 > 80 and pve_disk_size_bytes{id=~"storage/.*"} > 0
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: PVE storage filling up (instance {{ $labels.instance }})
    description: "Proxmox VE storage {{ $labels.id }} is above 80% used. Current value: {{ $value | printf \"%.2f\" }}%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

1.9.6. PVE storage almost full

Proxmox VE storage {{ $labels.id }} is above 95% used. Current value: {{ $value | printf "%.2f" }}%

- alert: PVEStorageAlmostFull
  expr: pve_disk_usage_bytes{id=~"storage/.*"} / pve_disk_size_bytes{id=~"storage/.*"} * 100 > 95 and pve_disk_size_bytes{id=~"storage/.*"} > 0
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: PVE storage almost full (instance {{ $labels.instance }})
    description: "Proxmox VE storage {{ $labels.id }} is above 95% used. Current value: {{ $value | printf \"%.2f\" }}%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

1.9.7. PVE guest not backed up

{{ $value }} Proxmox VE guest(s) are not covered by any backup job.

- alert: PVEGuestNotBackedUp
  expr: pve_not_backed_up_total > 0
  for: 0m
  labels:
    severity: warning
  annotations:
    summary: PVE guest not backed up (instance {{ $labels.instance }})
    description: "{{ $value }} Proxmox VE guest(s) are not covered by any backup job.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

1.9.8. PVE replication failed

Proxmox VE replication for {{ $labels.id }} has {{ $value }} failed sync(s).

- alert: PVEReplicationFailed
  expr: pve_replication_failed_syncs > 0
  for: 0m
  labels:
    severity: warning
  annotations:
    summary: PVE replication failed (instance {{ $labels.instance }})
    description: "Proxmox VE replication for {{ $labels.id }} has {{ $value }} failed sync(s).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

1.9.9. PVE cluster not quorate

Proxmox VE cluster has lost quorum.

  # Loss of quorum means the cluster cannot make decisions about VM placement
  # and fencing. This requires immediate attention.
- alert: PVEClusterNotQuorate
  expr: pve_cluster_info{quorate="0"} == 1
  for: 0m
  labels:
    severity: critical
  annotations:
    summary: PVE cluster not quorate (instance {{ $labels.instance }})
    description: "Proxmox VE cluster has lost quorum.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"