warning
1.4.1. IPMI collector down
IPMI collector {{ $labels.collector }} on {{ $labels.instance }} failed to scrape sensor data. Check FreeIPMI tools and BMC connectivity.
# The ipmi_up metric is per-collector. A value of 0 means the collector could not retrieve data from the BMC.
- alert: IPMICollectorDown
expr: ipmi_up == 0
for: 5m
labels:
severity: warning
annotations:
summary: IPMI collector down (instance {{ $labels.instance }})
description: "IPMI collector {{ $labels.collector }} on {{ $labels.instance }} failed to scrape sensor data. Check FreeIPMI tools and BMC connectivity.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" warning
1.4.2. IPMI temperature sensor warning
IPMI temperature sensor {{ $labels.name }} on {{ $labels.instance }} is in warning state.
# State values: 0=nominal, 1=warning, 2=critical. Thresholds are defined in the BMC firmware.
- alert: IPMITemperatureSensorWarning
expr: ipmi_temperature_state == 1
for: 5m
labels:
severity: warning
annotations:
summary: IPMI temperature sensor warning (instance {{ $labels.instance }})
description: "IPMI temperature sensor {{ $labels.name }} on {{ $labels.instance }} is in warning state.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" critical
1.4.3. IPMI temperature sensor critical
IPMI temperature sensor {{ $labels.name }} on {{ $labels.instance }} is in critical state. Immediate attention required to prevent hardware damage.
- alert: IPMITemperatureSensorCritical
expr: ipmi_temperature_state == 2
for: 0m
labels:
severity: critical
annotations:
summary: IPMI temperature sensor critical (instance {{ $labels.instance }})
description: "IPMI temperature sensor {{ $labels.name }} on {{ $labels.instance }} is in critical state. Immediate attention required to prevent hardware damage.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" warning
1.4.4. IPMI fan speed sensor warning
IPMI fan sensor {{ $labels.name }} on {{ $labels.instance }} is in warning state.
- alert: IPMIFanSpeedSensorWarning
expr: ipmi_fan_speed_state == 1
for: 5m
labels:
severity: warning
annotations:
summary: IPMI fan speed sensor warning (instance {{ $labels.instance }})
description: "IPMI fan sensor {{ $labels.name }} on {{ $labels.instance }} is in warning state.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" critical
1.4.5. IPMI fan speed sensor critical
IPMI fan sensor {{ $labels.name }} on {{ $labels.instance }} is in critical state. A fan may have failed.
- alert: IPMIFanSpeedSensorCritical
expr: ipmi_fan_speed_state == 2
for: 0m
labels:
severity: critical
annotations:
summary: IPMI fan speed sensor critical (instance {{ $labels.instance }})
description: "IPMI fan sensor {{ $labels.name }} on {{ $labels.instance }} is in critical state. A fan may have failed.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" critical
1.4.6. IPMI fan speed zero
IPMI fan {{ $labels.name }} on {{ $labels.instance }} reports 0 RPM. The fan may have failed.
- alert: IPMIFanSpeedZero
expr: ipmi_fan_speed_rpm == 0
for: 5m
labels:
severity: critical
annotations:
summary: IPMI fan speed zero (instance {{ $labels.instance }})
description: "IPMI fan {{ $labels.name }} on {{ $labels.instance }} reports 0 RPM. The fan may have failed.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" warning
1.4.7. IPMI voltage sensor warning
IPMI voltage sensor {{ $labels.name }} on {{ $labels.instance }} is in warning state.
- alert: IPMIVoltageSensorWarning
expr: ipmi_voltage_state == 1
for: 5m
labels:
severity: warning
annotations:
summary: IPMI voltage sensor warning (instance {{ $labels.instance }})
description: "IPMI voltage sensor {{ $labels.name }} on {{ $labels.instance }} is in warning state.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" critical
1.4.8. IPMI voltage sensor critical
IPMI voltage sensor {{ $labels.name }} on {{ $labels.instance }} is in critical state. Power supply or motherboard issue possible.
- alert: IPMIVoltageSensorCritical
expr: ipmi_voltage_state == 2
for: 0m
labels:
severity: critical
annotations:
summary: IPMI voltage sensor critical (instance {{ $labels.instance }})
description: "IPMI voltage sensor {{ $labels.name }} on {{ $labels.instance }} is in critical state. Power supply or motherboard issue possible.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" warning
1.4.9. IPMI current sensor warning
IPMI current sensor {{ $labels.name }} on {{ $labels.instance }} is in warning state.
- alert: IPMICurrentSensorWarning
expr: ipmi_current_state == 1
for: 5m
labels:
severity: warning
annotations:
summary: IPMI current sensor warning (instance {{ $labels.instance }})
description: "IPMI current sensor {{ $labels.name }} on {{ $labels.instance }} is in warning state.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" critical
1.4.10. IPMI current sensor critical
IPMI current sensor {{ $labels.name }} on {{ $labels.instance }} is in critical state.
- alert: IPMICurrentSensorCritical
expr: ipmi_current_state == 2
for: 0m
labels:
severity: critical
annotations:
summary: IPMI current sensor critical (instance {{ $labels.instance }})
description: "IPMI current sensor {{ $labels.name }} on {{ $labels.instance }} is in critical state.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" warning
1.4.11. IPMI power sensor warning
IPMI power sensor {{ $labels.name }} on {{ $labels.instance }} is in warning state.
- alert: IPMIPowerSensorWarning
expr: ipmi_power_state == 1
for: 5m
labels:
severity: warning
annotations:
summary: IPMI power sensor warning (instance {{ $labels.instance }})
description: "IPMI power sensor {{ $labels.name }} on {{ $labels.instance }} is in warning state.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" critical
1.4.12. IPMI power sensor critical
IPMI power sensor {{ $labels.name }} on {{ $labels.instance }} is in critical state.
- alert: IPMIPowerSensorCritical
expr: ipmi_power_state == 2
for: 0m
labels:
severity: critical
annotations:
summary: IPMI power sensor critical (instance {{ $labels.instance }})
description: "IPMI power sensor {{ $labels.name }} on {{ $labels.instance }} is in critical state.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" critical
1.4.13. IPMI generic sensor critical
IPMI sensor {{ $labels.name }} (type={{ $labels.type }}) on {{ $labels.instance }} is in critical state.
# Catches any sensor type not covered by the specific temperature/fan/voltage/current/power alerts.
- alert: IPMIGenericSensorCritical
expr: ipmi_sensor_state == 2
for: 5m
labels:
severity: critical
annotations:
summary: IPMI generic sensor critical (instance {{ $labels.instance }})
description: "IPMI sensor {{ $labels.name }} (type={{ $labels.type }}) on {{ $labels.instance }} is in critical state.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" critical
1.4.14. IPMI chassis power off
IPMI reports chassis power is off on {{ $labels.instance }}. The server may have shut down unexpectedly.
- alert: IPMIChassisPowerOff
expr: ipmi_chassis_power_state == 0
for: 0m
labels:
severity: critical
annotations:
summary: IPMI chassis power off (instance {{ $labels.instance }})
description: "IPMI reports chassis power is off on {{ $labels.instance }}. The server may have shut down unexpectedly.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" critical
1.4.15. IPMI chassis drive fault
IPMI reports a drive fault on {{ $labels.instance }}. Check disk health.
# The metric uses inverted logic: 1=no fault, 0=fault detected.
- alert: IPMIChassisDriveFault
expr: ipmi_chassis_drive_fault_state == 0
for: 0m
labels:
severity: critical
annotations:
summary: IPMI chassis drive fault (instance {{ $labels.instance }})
description: "IPMI reports a drive fault on {{ $labels.instance }}. Check disk health.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" critical
1.4.16. IPMI chassis cooling fault
IPMI reports a cooling/fan fault on {{ $labels.instance }}. Check fans and airflow.
# The metric uses inverted logic: 1=no fault, 0=fault detected.
- alert: IPMIChassisCoolingFault
expr: ipmi_chassis_cooling_fault_state == 0
for: 0m
labels:
severity: critical
annotations:
summary: IPMI chassis cooling fault (instance {{ $labels.instance }})
description: "IPMI reports a cooling/fan fault on {{ $labels.instance }}. Check fans and airflow.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" warning
1.4.17. IPMI SEL almost full
IPMI System Event Log on {{ $labels.instance }} has only {{ printf "%.0f" $value }} bytes free. Clear the SEL to prevent loss of new events.
# SEL storage is typically very limited (e.g., 16KB). When full, new events may be dropped.
- alert: IPMISELAlmostFull
expr: ipmi_sel_free_space_bytes < 512
for: 5m
labels:
severity: warning
annotations:
summary: IPMI SEL almost full (instance {{ $labels.instance }})
description: "IPMI System Event Log on {{ $labels.instance }} has only {{ printf \"%.0f\" $value }} bytes free. Clear the SEL to prevent loss of new events.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"