What is the Prometheus alert rule for "LiteLLM proxy failed requests rate high"?

LiteLLM proxy is returning failed responses to clients (>5% error rate over 5min). Investigate downstream LLM provider availability or auth issues. PromQL expression: sum(rate(litellm_proxy_failed_requests_metric_total[5m])) / sum(rate(litellm_proxy_total_requests_metric_total[5m])) > 0.05. Severity: warning. Duration: 10m.

What is the Prometheus alert rule for "LiteLLM request latency p95 high"?

LiteLLM request total latency p95 exceeds 10 seconds over 5min. Check downstream LLM provider response-times and proxy queue-depth. PromQL expression: histogram_quantile(0.95, sum(rate(litellm_request_total_latency_metric_bucket[5m])) by (le)) > 10. Severity: warning. Duration: 10m.

LiteLLM Prometheus Alert Rules

3 Prometheus alerting rules for LiteLLM. These rules cover critical and warning conditions — copy and paste the YAML into your Prometheus configuration.

⚠️

Alert thresholds depend on the nature of your applications. Some queries may have arbitrary tolerance thresholds. Building an efficient monitoring platform takes time. 😉

groups:
- name: EmbeddedExporter
  rules:
      # The threshold (1) is in USD. The `model` label carries the resolved model-name (post-routing).
      # PromQL `increase()` requires ≥2 datapoints with growth-difference to extrapolate positive —
      # for brand-new counter series this needs ≥2 distinct request bursts ≥1 scrape-cycle apart.
    - alert: LiteLLMProviderSpendOverBudget
      expr: sum(increase(litellm_spend_metric_total{model=~"(claude-|anthropic/).*"}[24h])) > 1
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: LiteLLM provider spend over budget (instance {{ $labels.instance }})
        description: "Cumulative spend for an LLM provider has exceeded the daily budget threshold. Replace the regex `(claude-|anthropic/).*` with your provider's model-name pattern. Useful as a soft-warning when `provider_budget_config` hard-cap is unavailable or disabled.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    - alert: LiteLLMProxyFailedRequestsRateHigh
      expr: sum(rate(litellm_proxy_failed_requests_metric_total[5m])) / sum(rate(litellm_proxy_total_requests_metric_total[5m])) > 0.05
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: LiteLLM proxy failed requests rate high (instance {{ $labels.instance }})
        description: "LiteLLM proxy is returning failed responses to clients (>5% error rate over 5min). Investigate downstream LLM provider availability or auth issues.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    - alert: LiteLLMRequestLatencyP95High
      expr: histogram_quantile(0.95, sum(rate(litellm_request_total_latency_metric_bucket[5m])) by (le)) > 10
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: LiteLLM request latency p95 high (instance {{ $labels.instance }})
        description: "LiteLLM request total latency p95 exceeds 10 seconds over 5min. Check downstream LLM provider response-times and proxy queue-depth.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

13.3. LiteLLM (3 rules)

wget https://raw.githubusercontent.com/samber/awesome-prometheus-alerts/refs/heads/master/dist/rules/litellm/embedded-exporter.yml

warning

13.3.1. LiteLLM provider spend over budget

Cumulative spend for an LLM provider has exceeded the daily budget threshold. Replace the regex `(claude-|anthropic/).*` with your provider's model-name pattern. Useful as a soft-warning when `provider_budget_config` hard-cap is unavailable or disabled.

  # The threshold (1) is in USD. The `model` label carries the resolved model-name (post-routing).
  # PromQL `increase()` requires ≥2 datapoints with growth-difference to extrapolate positive —
  # for brand-new counter series this needs ≥2 distinct request bursts ≥1 scrape-cycle apart.
- alert: LiteLLMProviderSpendOverBudget
  expr: sum(increase(litellm_spend_metric_total{model=~"(claude-|anthropic/).*"}[24h])) > 1
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: LiteLLM provider spend over budget (instance {{ $labels.instance }})
    description: "Cumulative spend for an LLM provider has exceeded the daily budget threshold. Replace the regex `(claude-|anthropic/).*` with your provider's model-name pattern. Useful as a soft-warning when `provider_budget_config` hard-cap is unavailable or disabled.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

warning

13.3.2. LiteLLM proxy failed requests rate high

LiteLLM proxy is returning failed responses to clients (>5% error rate over 5min). Investigate downstream LLM provider availability or auth issues.

- alert: LiteLLMProxyFailedRequestsRateHigh
  expr: sum(rate(litellm_proxy_failed_requests_metric_total[5m])) / sum(rate(litellm_proxy_total_requests_metric_total[5m])) > 0.05
  for: 10m
  labels:
    severity: warning
  annotations:
    summary: LiteLLM proxy failed requests rate high (instance {{ $labels.instance }})
    description: "LiteLLM proxy is returning failed responses to clients (>5% error rate over 5min). Investigate downstream LLM provider availability or auth issues.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

warning

13.3.3. LiteLLM request latency p95 high

LiteLLM request total latency p95 exceeds 10 seconds over 5min. Check downstream LLM provider response-times and proxy queue-depth.

- alert: LiteLLMRequestLatencyP95High
  expr: histogram_quantile(0.95, sum(rate(litellm_request_total_latency_metric_bucket[5m])) by (le)) > 10
  for: 10m
  labels:
    severity: warning
  annotations:
    summary: LiteLLM request latency p95 high (instance {{ $labels.instance }})
    description: "LiteLLM request total latency p95 exceeds 10 seconds over 5min. Check downstream LLM provider response-times and proxy queue-depth.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

More in Other

APC UPS Graph Node