Skip to main content
APA
Sponsored by CAST AI — Kubernetes cost optimization Better Stack — Uptime monitoring and log management
⚠️

Alert thresholds depend on the nature of your applications. Some queries may have arbitrary tolerance thresholds. Building an efficient monitoring platform takes time. 😉

SNMP Prometheus Alert Rules

7 Prometheus alerting rules for SNMP. Exported via prometheus/snmp_exporter. These rules cover critical and warning conditions — copy and paste the YAML into your Prometheus configuration.

These rules use standard IF-MIB and SNMPv2-MIB metrics. Metric names depend on your snmp.yml module configuration.
Thresholds for bandwidth and error rates are rough defaults - adjust to your environment.
wget https://raw.githubusercontent.com/samber/awesome-prometheus-alerts/refs/heads/master/dist/rules/snmp/snmp-exporter.yml
critical

9.10.1. SNMP target down

SNMP device {{ $labels.instance }} is unreachable.

  # Rename job=~"snmp.*" to match the actual job name in your Prometheus scrape config.
- alert: SNMPTargetDown
  expr: up{job=~"snmp.*"} == 0
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: SNMP target down (instance {{ $labels.instance }})
    description: "SNMP device {{ $labels.instance }} is unreachable.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

9.10.2. SNMP interface down

Interface {{ $labels.ifDescr }} on {{ $labels.instance }} is operationally down while administratively up.

- alert: SNMPInterfaceDown
  expr: (ifOperStatus{job=~"snmp.*"} == 2) and on(instance, job, ifIndex) (ifAdminStatus{job=~"snmp.*"} == 1)
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: SNMP interface down (instance {{ $labels.instance }})
    description: "Interface {{ $labels.ifDescr }} on {{ $labels.instance }} is operationally down while administratively up.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

9.10.3. SNMP interface high inbound error rate

Interface {{ $labels.ifDescr }} on {{ $labels.instance }} has an inbound error rate above 5%.

  # Threshold is a rough default. Adjust based on your network environment.
- alert: SNMPInterfaceHighInboundErrorRate
  expr: rate(ifInErrors{job=~"snmp.*"}[5m]) / (rate(ifHCInUcastPkts{job=~"snmp.*"}[5m]) + rate(ifHCInBroadcastPkts{job=~"snmp.*"}[5m]) + rate(ifHCInMulticastPkts{job=~"snmp.*"}[5m])) > 0.05 and (rate(ifHCInUcastPkts{job=~"snmp.*"}[5m]) + rate(ifHCInBroadcastPkts{job=~"snmp.*"}[5m]) + rate(ifHCInMulticastPkts{job=~"snmp.*"}[5m])) > 0
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: SNMP interface high inbound error rate (instance {{ $labels.instance }})
    description: "Interface {{ $labels.ifDescr }} on {{ $labels.instance }} has an inbound error rate above 5%.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

9.10.4. SNMP interface high outbound error rate

Interface {{ $labels.ifDescr }} on {{ $labels.instance }} has an outbound error rate above 5%.

  # Threshold is a rough default. Adjust based on your network environment.
- alert: SNMPInterfaceHighOutboundErrorRate
  expr: rate(ifOutErrors{job=~"snmp.*"}[5m]) / (rate(ifHCOutUcastPkts{job=~"snmp.*"}[5m]) + rate(ifHCOutBroadcastPkts{job=~"snmp.*"}[5m]) + rate(ifHCOutMulticastPkts{job=~"snmp.*"}[5m])) > 0.05 and (rate(ifHCOutUcastPkts{job=~"snmp.*"}[5m]) + rate(ifHCOutBroadcastPkts{job=~"snmp.*"}[5m]) + rate(ifHCOutMulticastPkts{job=~"snmp.*"}[5m])) > 0
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: SNMP interface high outbound error rate (instance {{ $labels.instance }})
    description: "Interface {{ $labels.ifDescr }} on {{ $labels.instance }} has an outbound error rate above 5%.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

9.10.5. SNMP interface high bandwidth usage inbound

Interface {{ $labels.ifDescr }} on {{ $labels.instance }} inbound utilization is above 80%.

  # Threshold is a rough default. ifSpeed is a Gauge32 that maxes out at ~4.29 Gbps. For 10G+ interfaces, use ifHighSpeed (in Mbps) instead.
- alert: SNMPInterfaceHighBandwidthUsageInbound
  expr: rate(ifHCInOctets{job=~"snmp.*"}[5m]) * 8 / ifSpeed > 0.80 and ifSpeed > 0
  for: 15m
  labels:
    severity: warning
  annotations:
    summary: SNMP interface high bandwidth usage inbound (instance {{ $labels.instance }})
    description: "Interface {{ $labels.ifDescr }} on {{ $labels.instance }} inbound utilization is above 80%.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

9.10.6. SNMP interface high bandwidth usage outbound

Interface {{ $labels.ifDescr }} on {{ $labels.instance }} outbound utilization is above 80%.

  # Threshold is a rough default. ifSpeed is a Gauge32 that maxes out at ~4.29 Gbps. For 10G+ interfaces, use ifHighSpeed (in Mbps) instead.
- alert: SNMPInterfaceHighBandwidthUsageOutbound
  expr: rate(ifHCOutOctets{job=~"snmp.*"}[5m]) * 8 / ifSpeed > 0.80 and ifSpeed > 0
  for: 15m
  labels:
    severity: warning
  annotations:
    summary: SNMP interface high bandwidth usage outbound (instance {{ $labels.instance }})
    description: "Interface {{ $labels.ifDescr }} on {{ $labels.instance }} outbound utilization is above 80%.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
info

9.10.7. SNMP device restarted

SNMP device {{ $labels.instance }} has restarted (uptime < 5 minutes).

  # sysUpTime is in centiseconds (hundredths of a second).
- alert: SNMPDeviceRestarted
  expr: sysUpTime / 100 < 300
  for: 0m
  labels:
    severity: info
  annotations:
    summary: SNMP device restarted (instance {{ $labels.instance }})
    description: "SNMP device {{ $labels.instance }} has restarted (uptime < 5 minutes).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"