Skip to main content
APA
Sponsored by CAST AI — Kubernetes cost optimization Better Stack — Uptime monitoring and log management
⚠️

Alert thresholds depend on the nature of your applications. Some queries may have arbitrary tolerance thresholds. Building an efficient monitoring platform takes time. 😉

Docker containers Prometheus Alert Rules

9 Prometheus alerting rules for Docker containers. Exported via google/cAdvisor. These rules cover critical and warning conditions — copy and paste the YAML into your Prometheus configuration.

1.5. google/cAdvisor (9 rules)

wget https://raw.githubusercontent.com/samber/awesome-prometheus-alerts/refs/heads/master/dist/rules/docker-containers/google-cadvisor.yml
warning

1.5.1. Container killed

A container has disappeared

  # This rule can be very noisy in dynamic infra with legitimate container start/stop/deployment.
- alert: ContainerKilled
  expr: time() - container_last_seen > 60
  for: 0m
  labels:
    severity: warning
  annotations:
    summary: Container killed (instance {{ $labels.instance }})
    description: "A container has disappeared\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

1.5.2. Container absent

A container is absent for 5 min

  # This rule can be very noisy in dynamic infra with legitimate container start/stop/deployment.
- alert: ContainerAbsent
  expr: absent(container_last_seen)
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: Container absent (instance {{ $labels.instance }})
    description: "A container is absent for 5 min\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

1.5.3. Container High CPU utilization

Container CPU utilization is above 80% (current: {{ $value | printf "%.2f" }}%)

  # Only fires for containers with explicit CPU limits. Containers without limits have cpu_quota=0, which is filtered out by the guard.
- alert: ContainerHighCPUUtilization
  expr: (sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) by (pod, container) / sum(container_spec_cpu_quota{container!=""}/container_spec_cpu_period{container!=""}) by (pod, container) * 100) > 80 and sum(container_spec_cpu_quota{container!=""}/container_spec_cpu_period{container!=""}) by (pod, container) > 0
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: Container High CPU utilization (instance {{ $labels.instance }})
    description: "Container CPU utilization is above 80% (current: {{ $value | printf \"%.2f\" }}%)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

1.5.4. Container High Memory usage

Container Memory usage is above 80%

  # See https://medium.com/faun/how-much-is-too-much-the-linux-oomkiller-and-used-memory-d32186f29c9d
- alert: ContainerHighMemoryUsage
  expr: (sum(container_memory_working_set_bytes{name!=""}) BY (instance, name) / sum(container_spec_memory_limit_bytes > 0) BY (instance, name) * 100) > 80
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: Container High Memory usage (instance {{ $labels.instance }})
    description: "Container Memory usage is above 80%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

1.5.5. Container Volume usage

Container Volume usage is above 80%

- alert: ContainerVolumeUsage
  expr: (1 - (sum(container_fs_inodes_free{name!=""}) BY (instance) / sum(container_fs_inodes_total) BY (instance))) * 100 > 80 and sum(container_fs_inodes_total) BY (instance) > 0
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: Container Volume usage (instance {{ $labels.instance }})
    description: "Container Volume usage is above 80%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

1.5.6. Container high throttle rate

Container is being throttled ({{ $value | humanizePercentage }})

- alert: ContainerHighThrottleRate
  expr: sum(rate(container_cpu_cfs_throttled_periods_total{container!=""}[5m])) by (container, pod, namespace) / sum(rate(container_cpu_cfs_periods_total[5m])) by (container, pod, namespace) > ( 25 / 100 ) and sum(rate(container_cpu_cfs_periods_total[5m])) by (container, pod, namespace) > 0
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: Container high throttle rate (instance {{ $labels.instance }})
    description: "Container is being throttled ({{ $value | humanizePercentage }})\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
info

1.5.7. Container high low change CPU usage

This alert rule monitors the absolute change in CPU usage within a time window and triggers an alert when the change exceeds 25%.

- alert: ContainerHighLowChangeCPUUsage
  expr: (abs((sum by (instance, name) (rate(container_cpu_usage_seconds_total{name!=""}[1m])) * 100) - (sum by (instance, name) (rate(container_cpu_usage_seconds_total{name!=""}[1m] offset 1m)) * 100)) or abs((sum by (instance, name) (rate(container_cpu_usage_seconds_total{name!=""}[1m])) * 100) - (sum by (instance, name) (rate(container_cpu_usage_seconds_total{name!=""}[5m] offset 1m)) * 100))) > 25
  for: 0m
  labels:
    severity: info
  annotations:
    summary: Container high low change CPU usage (instance {{ $labels.instance }})
    description: "This alert rule monitors the absolute change in CPU usage within a time window and triggers an alert when the change exceeds 25%.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
info

1.5.8. Container Low CPU utilization

Container CPU utilization is under 20% for 1 week. Consider reducing the allocated CPU. (current: {{ $value | printf "%.2f" }}%)

- alert: ContainerLowCPUUtilization
  expr: (sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) by (pod, container) / sum(container_spec_cpu_quota{container!=""}/container_spec_cpu_period{container!=""}) by (pod, container) * 100) < 20 and sum(container_spec_cpu_quota{container!=""}/container_spec_cpu_period{container!=""}) by (pod, container) > 0
  for: 7d
  labels:
    severity: info
  annotations:
    summary: Container Low CPU utilization (instance {{ $labels.instance }})
    description: "Container CPU utilization is under 20% for 1 week. Consider reducing the allocated CPU. (current: {{ $value | printf \"%.2f\" }}%)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
info

1.5.9. Container Low Memory usage

Container Memory usage is under 20% for 1 week. Consider reducing the allocated memory.

- alert: ContainerLowMemoryUsage
  expr: (sum(container_memory_working_set_bytes{name!=""}) BY (instance, name) / sum(container_spec_memory_limit_bytes > 0) BY (instance, name) * 100) < 20
  for: 7d
  labels:
    severity: info
  annotations:
    summary: Container Low Memory usage (instance {{ $labels.instance }})
    description: "Container Memory usage is under 20% for 1 week. Consider reducing the allocated memory.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"