What is the Prometheus alert rule for "Istio Kubernetes gateway availability drop"?

Istio ingress gateway has only {{ $value }} available pod(s). Inbound traffic will likely be affected. PromQL expression: min(kube_deployment_status_replicas_available{deployment="istio-ingressgateway", namespace="istio-system"}) without (instance, pod) < 2. Severity: warning. Duration: 1m.

What is the Prometheus alert rule for "Istio Pilot high push error rate"?

Number of Istio Pilot push errors is too high (> 5%). Envoy sidecars might have outdated configuration. PromQL expression: sum(rate(pilot_xds_push_errors[1m])) / sum(rate(pilot_xds_pushes[1m])) * 100 > 5 and sum(rate(pilot_xds_pushes[1m])) > 0. Severity: warning. Duration: 1m.

What is the Prometheus alert rule for "Istio Mixer Prometheus dispatches low"?

Number of Mixer dispatches to Prometheus is too low. Istio metrics might not be being exported properly. PromQL expression: sum(rate(mixer_runtime_dispatches_total{adapter=~"prometheus"}[1m])) < 180. Severity: warning. Duration: 1m.

What is the Prometheus alert rule for "Istio high total request rate"?

Global request rate in the service mesh is unusually high ({{ $value | printf "%.2f" }} req/s). PromQL expression: sum(rate(istio_requests_total{reporter="destination"}[5m])) > 1000. Severity: warning. Duration: 2m.

What is the Prometheus alert rule for "Istio low total request rate"?

Global request rate in the service mesh is unusually low ({{ $value | printf "%.2f" }} req/s). PromQL expression: sum(rate(istio_requests_total{reporter="destination"}[5m])) < 100. Severity: warning. Duration: 2m.

Istio Prometheus Alert Rules

Q: What is the Prometheus alert rule for "Istio Pilot high push error rate"?

Number of Istio Pilot push errors is too high (> 5%). Envoy sidecars might have outdated configuration. PromQL expression: sum(rate(pilot_xds_push_errors[1m])) / sum(rate(pilot_xds_pushes[1m])) * 100 > 5 and sum(rate(pilot_xds_pushes[1m])) > 0. Severity: warning. Duration: 1m.

Q: What is the Prometheus alert rule for "Istio high request latency"?

Istio average request duration is {{ $value }}ms (> 100ms). PromQL expression: rate(istio_request_duration_milliseconds_sum{reporter="destination"}[1m]) / rate(istio_request_duration_milliseconds_count{reporter="destination"}[1m]) > 100 and rate(istio_request_duration_milliseconds_count{reporter="destination"}[1m]) > 0. Severity: warning. Duration: 1m.

10 Prometheus alerting rules for Istio. Exported via Embedded exporter. These rules cover critical and warning conditions — copy and paste the YAML into your Prometheus configuration.

⚠️

Alert thresholds depend on the nature of your applications. Some queries may have arbitrary tolerance thresholds. Building an efficient monitoring platform takes time. 😉

groups:
- name: EmbeddedExporter
  rules:
    - alert: IstioKubernetesGatewayAvailabilityDrop
      expr: min(kube_deployment_status_replicas_available{deployment="istio-ingressgateway", namespace="istio-system"}) without (instance, pod) < 2
      for: 1m
      labels:
        severity: warning
      annotations:
        summary: Istio Kubernetes gateway availability drop (instance {{ $labels.instance }})
        description: "Istio ingress gateway has only {{ $value }} available pod(s). Inbound traffic will likely be affected.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    - alert: IstioPilotHighPushErrorRate
      expr: sum(rate(pilot_xds_push_errors[1m])) / sum(rate(pilot_xds_pushes[1m])) * 100 > 5 and sum(rate(pilot_xds_pushes[1m])) > 0
      for: 1m
      labels:
        severity: warning
      annotations:
        summary: Istio Pilot high push error rate (instance {{ $labels.instance }})
        description: "Number of Istio Pilot push errors is too high (> 5%). Envoy sidecars might have outdated configuration.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
      # Mixer was deprecated in Istio 1.5 and removed in Istio 1.8+. This alert only applies to Istio < 1.8.
    - alert: IstioMixerPrometheusDispatchesLow
      expr: sum(rate(mixer_runtime_dispatches_total{adapter=~"prometheus"}[1m])) < 180
      for: 1m
      labels:
        severity: warning
      annotations:
        summary: Istio Mixer Prometheus dispatches low (instance {{ $labels.instance }})
        description: "Number of Mixer dispatches to Prometheus is too low. Istio metrics might not be being exported properly.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
      # Threshold of 1000 req/s is a rough default. Adjust to your expected peak traffic.
    - alert: IstioHighTotalRequestRate
      expr: sum(rate(istio_requests_total{reporter="destination"}[5m])) > 1000
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: Istio high total request rate (instance {{ $labels.instance }})
        description: "Global request rate in the service mesh is unusually high ({{ $value | printf \"%.2f\" }} req/s).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
      # Threshold of 100 req/s is a rough default. Adjust to your expected baseline traffic. This alert may fire on startup or low-traffic environments.
    - alert: IstioLowTotalRequestRate
      expr: sum(rate(istio_requests_total{reporter="destination"}[5m])) < 100
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: Istio low total request rate (instance {{ $labels.instance }})
        description: "Global request rate in the service mesh is unusually low ({{ $value | printf \"%.2f\" }} req/s).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    - alert: IstioHigh4xxErrorRate
      expr: sum(rate(istio_requests_total{reporter="destination", response_code=~"4.*"}[5m])) / sum(rate(istio_requests_total{reporter="destination"}[5m])) * 100 > 5 and sum(rate(istio_requests_total{reporter="destination"}[5m])) > 0
      for: 1m
      labels:
        severity: warning
      annotations:
        summary: Istio high 4xx error rate (instance {{ $labels.instance }})
        description: "High percentage of HTTP 4xx responses in Istio ({{ $value | printf \"%.1f\" }}% > 5%).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    - alert: IstioHigh5xxErrorRate
      expr: sum(rate(istio_requests_total{reporter="destination", response_code=~"5.*"}[5m])) / sum(rate(istio_requests_total{reporter="destination"}[5m])) * 100 > 5 and sum(rate(istio_requests_total{reporter="destination"}[5m])) > 0
      for: 1m
      labels:
        severity: warning
      annotations:
        summary: Istio high 5xx error rate (instance {{ $labels.instance }})
        description: "High percentage of HTTP 5xx responses in Istio ({{ $value | printf \"%.1f\" }}% > 5%).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    - alert: IstioHighRequestLatency
      expr: rate(istio_request_duration_milliseconds_sum{reporter="destination"}[1m]) / rate(istio_request_duration_milliseconds_count{reporter="destination"}[1m]) > 100 and rate(istio_request_duration_milliseconds_count{reporter="destination"}[1m]) > 0
      for: 1m
      labels:
        severity: warning
      annotations:
        summary: Istio high request latency (instance {{ $labels.instance }})
        description: "Istio average request duration is {{ $value }}ms (> 100ms).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    - alert: IstioLatency99Percentile
      expr: histogram_quantile(0.99, sum(rate(istio_request_duration_milliseconds_bucket[1m])) by (destination_canonical_service, destination_workload_namespace, le)) > 1000
      for: 1m
      labels:
        severity: warning
      annotations:
        summary: Istio latency 99 percentile (instance {{ $labels.instance }})
        description: "Istio p99 request latency is {{ $value }}ms (threshold: 1000ms).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    - alert: IstioPilotDuplicateEntry
      expr: sum(pilot_duplicate_envoy_clusters{}) > 0
      for: 0m
      labels:
        severity: critical
      annotations:
        summary: Istio Pilot Duplicate Entry (instance {{ $labels.instance }})
        description: "Istio Pilot has detected {{ $value }} duplicate Envoy cluster(s), indicating misconfigured DestinationRules or ServiceEntries.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

4.8. Embedded exporter (10 rules)

wget https://raw.githubusercontent.com/samber/awesome-prometheus-alerts/refs/heads/master/dist/rules/istio/embedded-exporter.yml

warning

4.8.1. Istio Kubernetes gateway availability drop

Istio ingress gateway has only {{ $value }} available pod(s). Inbound traffic will likely be affected.

- alert: IstioKubernetesGatewayAvailabilityDrop
  expr: min(kube_deployment_status_replicas_available{deployment="istio-ingressgateway", namespace="istio-system"}) without (instance, pod) < 2
  for: 1m
  labels:
    severity: warning
  annotations:
    summary: Istio Kubernetes gateway availability drop (instance {{ $labels.instance }})
    description: "Istio ingress gateway has only {{ $value }} available pod(s). Inbound traffic will likely be affected.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

warning

4.8.2. Istio Pilot high push error rate

Number of Istio Pilot push errors is too high (> 5%). Envoy sidecars might have outdated configuration.

- alert: IstioPilotHighPushErrorRate
  expr: sum(rate(pilot_xds_push_errors[1m])) / sum(rate(pilot_xds_pushes[1m])) * 100 > 5 and sum(rate(pilot_xds_pushes[1m])) > 0
  for: 1m
  labels:
    severity: warning
  annotations:
    summary: Istio Pilot high push error rate (instance {{ $labels.instance }})
    description: "Number of Istio Pilot push errors is too high (> 5%). Envoy sidecars might have outdated configuration.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

warning

4.8.3. Istio Mixer Prometheus dispatches low

Number of Mixer dispatches to Prometheus is too low. Istio metrics might not be being exported properly.

  # Mixer was deprecated in Istio 1.5 and removed in Istio 1.8+. This alert only applies to Istio < 1.8.
- alert: IstioMixerPrometheusDispatchesLow
  expr: sum(rate(mixer_runtime_dispatches_total{adapter=~"prometheus"}[1m])) < 180
  for: 1m
  labels:
    severity: warning
  annotations:
    summary: Istio Mixer Prometheus dispatches low (instance {{ $labels.instance }})
    description: "Number of Mixer dispatches to Prometheus is too low. Istio metrics might not be being exported properly.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

warning

4.8.4. Istio high total request rate

Global request rate in the service mesh is unusually high ({{ $value | printf "%.2f" }} req/s).

  # Threshold of 1000 req/s is a rough default. Adjust to your expected peak traffic.
- alert: IstioHighTotalRequestRate
  expr: sum(rate(istio_requests_total{reporter="destination"}[5m])) > 1000
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: Istio high total request rate (instance {{ $labels.instance }})
    description: "Global request rate in the service mesh is unusually high ({{ $value | printf \"%.2f\" }} req/s).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

warning

4.8.5. Istio low total request rate

Global request rate in the service mesh is unusually low ({{ $value | printf "%.2f" }} req/s).

  # Threshold of 100 req/s is a rough default. Adjust to your expected baseline traffic. This alert may fire on startup or low-traffic environments.
- alert: IstioLowTotalRequestRate
  expr: sum(rate(istio_requests_total{reporter="destination"}[5m])) < 100
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: Istio low total request rate (instance {{ $labels.instance }})
    description: "Global request rate in the service mesh is unusually low ({{ $value | printf \"%.2f\" }} req/s).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

warning

4.8.6. Istio high 4xx error rate

High percentage of HTTP 4xx responses in Istio ({{ $value | printf "%.1f" }}% > 5%).

- alert: IstioHigh4xxErrorRate
  expr: sum(rate(istio_requests_total{reporter="destination", response_code=~"4.*"}[5m])) / sum(rate(istio_requests_total{reporter="destination"}[5m])) * 100 > 5 and sum(rate(istio_requests_total{reporter="destination"}[5m])) > 0
  for: 1m
  labels:
    severity: warning
  annotations:
    summary: Istio high 4xx error rate (instance {{ $labels.instance }})
    description: "High percentage of HTTP 4xx responses in Istio ({{ $value | printf \"%.1f\" }}% > 5%).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

warning

4.8.7. Istio high 5xx error rate

High percentage of HTTP 5xx responses in Istio ({{ $value | printf "%.1f" }}% > 5%).

- alert: IstioHigh5xxErrorRate
  expr: sum(rate(istio_requests_total{reporter="destination", response_code=~"5.*"}[5m])) / sum(rate(istio_requests_total{reporter="destination"}[5m])) * 100 > 5 and sum(rate(istio_requests_total{reporter="destination"}[5m])) > 0
  for: 1m
  labels:
    severity: warning
  annotations:
    summary: Istio high 5xx error rate (instance {{ $labels.instance }})
    description: "High percentage of HTTP 5xx responses in Istio ({{ $value | printf \"%.1f\" }}% > 5%).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

warning

4.8.8. Istio high request latency

Istio average request duration is {{ $value }}ms (> 100ms).

- alert: IstioHighRequestLatency
  expr: rate(istio_request_duration_milliseconds_sum{reporter="destination"}[1m]) / rate(istio_request_duration_milliseconds_count{reporter="destination"}[1m]) > 100 and rate(istio_request_duration_milliseconds_count{reporter="destination"}[1m]) > 0
  for: 1m
  labels:
    severity: warning
  annotations:
    summary: Istio high request latency (instance {{ $labels.instance }})
    description: "Istio average request duration is {{ $value }}ms (> 100ms).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

warning

4.8.9. Istio latency 99 percentile

Istio p99 request latency is {{ $value }}ms (threshold: 1000ms).

- alert: IstioLatency99Percentile
  expr: histogram_quantile(0.99, sum(rate(istio_request_duration_milliseconds_bucket[1m])) by (destination_canonical_service, destination_workload_namespace, le)) > 1000
  for: 1m
  labels:
    severity: warning
  annotations:
    summary: Istio latency 99 percentile (instance {{ $labels.instance }})
    description: "Istio p99 request latency is {{ $value }}ms (threshold: 1000ms).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

critical

4.8.10. Istio Pilot Duplicate Entry

Istio Pilot has detected {{ $value }} duplicate Envoy cluster(s), indicating misconfigured DestinationRules or ServiceEntries.

- alert: IstioPilotDuplicateEntry
  expr: sum(pilot_duplicate_envoy_clusters{}) > 0
  for: 0m
  labels:
    severity: critical
  annotations:
    summary: Istio Pilot Duplicate Entry (instance {{ $labels.instance }})
    description: "Istio Pilot has detected {{ $value }} duplicate Envoy cluster(s), indicating misconfigured DestinationRules or ServiceEntries.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

More in Proxies, load balancers and service meshes

Nginx Apache HaProxy Traefik Caddy Envoy Linkerd