Skip to main content
APA
Sponsored by CAST AI — Kubernetes cost optimization Better Stack — Uptime monitoring and log management
⚠️

Alert thresholds depend on the nature of your applications. Some queries may have arbitrary tolerance thresholds. Building an efficient monitoring platform takes time. 😉

OpenTelemetry Collector Prometheus Alert Rules

12 Prometheus alerting rules for OpenTelemetry Collector. Exported via Embedded exporter. These rules cover critical and warning conditions — copy and paste the YAML into your Prometheus configuration.

12.8. Embedded exporter (12 rules)

OpenTelemetry Collector self-monitoring metrics are exposed on port 8888 by default at the /metrics endpoint.
These alerts monitor the collector's health when metrics are ingested via the Prometheus OTLP endpoint or scraped directly.
All collector internal metrics are prefixed with 'otelcol_'.
wget https://raw.githubusercontent.com/samber/awesome-prometheus-alerts/refs/heads/master/dist/rules/opentelemetry-collector/embedded-exporter.yml
critical

12.8.1. OpenTelemetry Collector down

OpenTelemetry Collector instance has disappeared or is not being scraped

  # Adjust the job label regex to match the actual job name in your Prometheus scrape config.
- alert: OpenTelemetryCollectorDown
  expr: up{job=~".*otel.*collector.*"} == 0
  for: 1m
  labels:
    severity: critical
  annotations:
    summary: OpenTelemetry Collector down (instance {{ $labels.instance }})
    description: "OpenTelemetry Collector instance has disappeared or is not being scraped\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

12.8.2. OpenTelemetry Collector receiver refused spans

OpenTelemetry Collector is refusing {{ $value | humanize }}/s spans on {{ $labels.receiver }}.

  # Threshold of 0.05/s avoids firing on transient single-event spikes.
- alert: OpenTelemetryCollectorReceiverRefusedSpans
  expr: rate(otelcol_receiver_refused_spans[5m]) > 0.05
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: OpenTelemetry Collector receiver refused spans (instance {{ $labels.instance }})
    description: "OpenTelemetry Collector is refusing {{ $value | humanize }}/s spans on {{ $labels.receiver }}.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

12.8.3. OpenTelemetry Collector receiver refused metric points

OpenTelemetry Collector is refusing {{ $value | humanize }}/s metric points on {{ $labels.receiver }}.

  # Threshold of 0.05/s avoids firing on transient single-event spikes.
- alert: OpenTelemetryCollectorReceiverRefusedMetricPoints
  expr: rate(otelcol_receiver_refused_metric_points[5m]) > 0.05
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: OpenTelemetry Collector receiver refused metric points (instance {{ $labels.instance }})
    description: "OpenTelemetry Collector is refusing {{ $value | humanize }}/s metric points on {{ $labels.receiver }}.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

12.8.4. OpenTelemetry Collector receiver refused log records

OpenTelemetry Collector is refusing {{ $value | humanize }}/s log records on {{ $labels.receiver }}.

  # Threshold of 0.05/s avoids firing on transient single-event spikes.
- alert: OpenTelemetryCollectorReceiverRefusedLogRecords
  expr: rate(otelcol_receiver_refused_log_records[5m]) > 0.05
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: OpenTelemetry Collector receiver refused log records (instance {{ $labels.instance }})
    description: "OpenTelemetry Collector is refusing {{ $value | humanize }}/s log records on {{ $labels.receiver }}.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

12.8.5. OpenTelemetry Collector exporter failed spans

OpenTelemetry Collector failing to send {{ $value | humanize }}/s spans via {{ $labels.exporter }}.

  # Threshold of 0.05/s avoids firing on transient single-event spikes.
- alert: OpenTelemetryCollectorExporterFailedSpans
  expr: rate(otelcol_exporter_send_failed_spans[5m]) > 0.05
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: OpenTelemetry Collector exporter failed spans (instance {{ $labels.instance }})
    description: "OpenTelemetry Collector failing to send {{ $value | humanize }}/s spans via {{ $labels.exporter }}.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

12.8.6. OpenTelemetry Collector exporter failed metric points

OpenTelemetry Collector failing to send {{ $value | humanize }}/s metric points via {{ $labels.exporter }}.

  # Threshold of 0.05/s avoids firing on transient single-event spikes.
- alert: OpenTelemetryCollectorExporterFailedMetricPoints
  expr: rate(otelcol_exporter_send_failed_metric_points[5m]) > 0.05
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: OpenTelemetry Collector exporter failed metric points (instance {{ $labels.instance }})
    description: "OpenTelemetry Collector failing to send {{ $value | humanize }}/s metric points via {{ $labels.exporter }}.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

12.8.7. OpenTelemetry Collector exporter failed log records

OpenTelemetry Collector failing to send {{ $value | humanize }}/s log records via {{ $labels.exporter }}.

  # Threshold of 0.05/s avoids firing on transient single-event spikes.
- alert: OpenTelemetryCollectorExporterFailedLogRecords
  expr: rate(otelcol_exporter_send_failed_log_records[5m]) > 0.05
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: OpenTelemetry Collector exporter failed log records (instance {{ $labels.instance }})
    description: "OpenTelemetry Collector failing to send {{ $value | humanize }}/s log records via {{ $labels.exporter }}.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

12.8.8. OpenTelemetry Collector exporter queue nearly full

OpenTelemetry Collector exporter {{ $labels.exporter }} queue is over 80% full

- alert: OpenTelemetryCollectorExporterQueueNearlyFull
  expr: (otelcol_exporter_queue_size / on(instance, job, exporter) otelcol_exporter_queue_capacity) > 0.8 and otelcol_exporter_queue_capacity > 0
  for: 0m
  labels:
    severity: warning
  annotations:
    summary: OpenTelemetry Collector exporter queue nearly full (instance {{ $labels.instance }})
    description: "OpenTelemetry Collector exporter {{ $labels.exporter }} queue is over 80% full\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

12.8.9. OpenTelemetry Collector processor refused spans

OpenTelemetry Collector processor {{ $labels.processor }} is refusing spans ({{ $value | humanize }}/s), likely due to backpressure.

  # Threshold of 0.05/s avoids firing on transient single-event spikes.
  # These processor metrics are deprecated since collector v0.110.0.
- alert: OpenTelemetryCollectorProcessorRefusedSpans
  expr: rate(otelcol_processor_refused_spans[5m]) > 0.05
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: OpenTelemetry Collector processor refused spans (instance {{ $labels.instance }})
    description: "OpenTelemetry Collector processor {{ $labels.processor }} is refusing spans ({{ $value | humanize }}/s), likely due to backpressure.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

12.8.10. OpenTelemetry Collector processor refused metric points

OpenTelemetry Collector processor {{ $labels.processor }} is refusing metric points ({{ $value | humanize }}/s), likely due to backpressure.

  # Threshold of 0.05/s avoids firing on transient single-event spikes.
  # These processor metrics are deprecated since collector v0.110.0.
- alert: OpenTelemetryCollectorProcessorRefusedMetricPoints
  expr: rate(otelcol_processor_refused_metric_points[5m]) > 0.05
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: OpenTelemetry Collector processor refused metric points (instance {{ $labels.instance }})
    description: "OpenTelemetry Collector processor {{ $labels.processor }} is refusing metric points ({{ $value | humanize }}/s), likely due to backpressure.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

12.8.11. OpenTelemetry Collector high memory usage

OpenTelemetry Collector memory usage is above 90%

- alert: OpenTelemetryCollectorHighMemoryUsage
  expr: (otelcol_process_runtime_heap_alloc_bytes / on(instance, job) otelcol_process_runtime_total_sys_memory_bytes) > 0.9
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: OpenTelemetry Collector high memory usage (instance {{ $labels.instance }})
    description: "OpenTelemetry Collector memory usage is above 90%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

12.8.12. OpenTelemetry Collector OTLP receiver errors

OpenTelemetry Collector OTLP receiver is completely failing - all spans are being refused

- alert: OpenTelemetryCollectorOTLPReceiverErrors
  expr: rate(otelcol_receiver_accepted_spans{receiver=~"otlp"}[5m]) == 0 and rate(otelcol_receiver_refused_spans{receiver=~"otlp"}[5m]) > 0
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: OpenTelemetry Collector OTLP receiver errors (instance {{ $labels.instance }})
    description: "OpenTelemetry Collector OTLP receiver is completely failing - all spans are being refused\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"