Skip to main content
APA
Sponsored by CAST AI — Kubernetes cost optimization Better Stack — Uptime monitoring and log management
⚠️

Alert thresholds depend on the nature of your applications. Some queries may have arbitrary tolerance thresholds. Building an efficient monitoring platform takes time. 😉

Istio Prometheus Alert Rules

10 Prometheus alerting rules for Istio. Exported via Embedded exporter. These rules cover critical and warning conditions — copy and paste the YAML into your Prometheus configuration.

4.8. Embedded exporter (10 rules)

wget https://raw.githubusercontent.com/samber/awesome-prometheus-alerts/refs/heads/master/dist/rules/istio/embedded-exporter.yml
warning

4.8.1. Istio Kubernetes gateway availability drop

Istio ingress gateway has only {{ $value }} available pod(s). Inbound traffic will likely be affected.

- alert: IstioKubernetesGatewayAvailabilityDrop
  expr: min(kube_deployment_status_replicas_available{deployment="istio-ingressgateway", namespace="istio-system"}) without (instance, pod) < 2
  for: 1m
  labels:
    severity: warning
  annotations:
    summary: Istio Kubernetes gateway availability drop (instance {{ $labels.instance }})
    description: "Istio ingress gateway has only {{ $value }} available pod(s). Inbound traffic will likely be affected.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

4.8.2. Istio Pilot high push error rate

Number of Istio Pilot push errors is too high (> 5%). Envoy sidecars might have outdated configuration.

- alert: IstioPilotHighPushErrorRate
  expr: sum(rate(pilot_xds_push_errors[1m])) / sum(rate(pilot_xds_pushes[1m])) * 100 > 5 and sum(rate(pilot_xds_pushes[1m])) > 0
  for: 1m
  labels:
    severity: warning
  annotations:
    summary: Istio Pilot high push error rate (instance {{ $labels.instance }})
    description: "Number of Istio Pilot push errors is too high (> 5%). Envoy sidecars might have outdated configuration.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

4.8.3. Istio Mixer Prometheus dispatches low

Number of Mixer dispatches to Prometheus is too low. Istio metrics might not be being exported properly.

  # Mixer was deprecated in Istio 1.5 and removed in Istio 1.8+. This alert only applies to Istio < 1.8.
- alert: IstioMixerPrometheusDispatchesLow
  expr: sum(rate(mixer_runtime_dispatches_total{adapter=~"prometheus"}[1m])) < 180
  for: 1m
  labels:
    severity: warning
  annotations:
    summary: Istio Mixer Prometheus dispatches low (instance {{ $labels.instance }})
    description: "Number of Mixer dispatches to Prometheus is too low. Istio metrics might not be being exported properly.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

4.8.4. Istio high total request rate

Global request rate in the service mesh is unusually high ({{ $value | printf "%.2f" }} req/s).

  # Threshold of 1000 req/s is a rough default. Adjust to your expected peak traffic.
- alert: IstioHighTotalRequestRate
  expr: sum(rate(istio_requests_total{reporter="destination"}[5m])) > 1000
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: Istio high total request rate (instance {{ $labels.instance }})
    description: "Global request rate in the service mesh is unusually high ({{ $value | printf \"%.2f\" }} req/s).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

4.8.5. Istio low total request rate

Global request rate in the service mesh is unusually low ({{ $value | printf "%.2f" }} req/s).

  # Threshold of 100 req/s is a rough default. Adjust to your expected baseline traffic. This alert may fire on startup or low-traffic environments.
- alert: IstioLowTotalRequestRate
  expr: sum(rate(istio_requests_total{reporter="destination"}[5m])) < 100
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: Istio low total request rate (instance {{ $labels.instance }})
    description: "Global request rate in the service mesh is unusually low ({{ $value | printf \"%.2f\" }} req/s).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

4.8.6. Istio high 4xx error rate

High percentage of HTTP 4xx responses in Istio ({{ $value | printf "%.1f" }}% > 5%).

- alert: IstioHigh4xxErrorRate
  expr: sum(rate(istio_requests_total{reporter="destination", response_code=~"4.*"}[5m])) / sum(rate(istio_requests_total{reporter="destination"}[5m])) * 100 > 5 and sum(rate(istio_requests_total{reporter="destination"}[5m])) > 0
  for: 1m
  labels:
    severity: warning
  annotations:
    summary: Istio high 4xx error rate (instance {{ $labels.instance }})
    description: "High percentage of HTTP 4xx responses in Istio ({{ $value | printf \"%.1f\" }}% > 5%).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

4.8.7. Istio high 5xx error rate

High percentage of HTTP 5xx responses in Istio ({{ $value | printf "%.1f" }}% > 5%).

- alert: IstioHigh5xxErrorRate
  expr: sum(rate(istio_requests_total{reporter="destination", response_code=~"5.*"}[5m])) / sum(rate(istio_requests_total{reporter="destination"}[5m])) * 100 > 5 and sum(rate(istio_requests_total{reporter="destination"}[5m])) > 0
  for: 1m
  labels:
    severity: warning
  annotations:
    summary: Istio high 5xx error rate (instance {{ $labels.instance }})
    description: "High percentage of HTTP 5xx responses in Istio ({{ $value | printf \"%.1f\" }}% > 5%).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

4.8.8. Istio high request latency

Istio average request duration is {{ $value }}ms (> 100ms).

- alert: IstioHighRequestLatency
  expr: rate(istio_request_duration_milliseconds_sum{reporter="destination"}[1m]) / rate(istio_request_duration_milliseconds_count{reporter="destination"}[1m]) > 100 and rate(istio_request_duration_milliseconds_count{reporter="destination"}[1m]) > 0
  for: 1m
  labels:
    severity: warning
  annotations:
    summary: Istio high request latency (instance {{ $labels.instance }})
    description: "Istio average request duration is {{ $value }}ms (> 100ms).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

4.8.9. Istio latency 99 percentile

Istio p99 request latency is {{ $value }}ms (threshold: 1000ms).

- alert: IstioLatency99Percentile
  expr: histogram_quantile(0.99, sum(rate(istio_request_duration_milliseconds_bucket[1m])) by (destination_canonical_service, destination_workload_namespace, le)) > 1000
  for: 1m
  labels:
    severity: warning
  annotations:
    summary: Istio latency 99 percentile (instance {{ $labels.instance }})
    description: "Istio p99 request latency is {{ $value }}ms (threshold: 1000ms).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

4.8.10. Istio Pilot Duplicate Entry

Istio Pilot has detected {{ $value }} duplicate Envoy cluster(s), indicating misconfigured DestinationRules or ServiceEntries.

- alert: IstioPilotDuplicateEntry
  expr: sum(pilot_duplicate_envoy_clusters{}) > 0
  for: 0m
  labels:
    severity: critical
  annotations:
    summary: Istio Pilot Duplicate Entry (instance {{ $labels.instance }})
    description: "Istio Pilot has detected {{ $value }} duplicate Envoy cluster(s), indicating misconfigured DestinationRules or ServiceEntries.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"