Skip to main content
APA
Sponsored by CAST AI — Kubernetes cost optimization Better Stack — Uptime monitoring and log management
⚠️

Alert thresholds depend on the nature of your applications. Some queries may have arbitrary tolerance thresholds. Building an efficient monitoring platform takes time. 😉

Nats Prometheus Alert Rules

13 Prometheus alerting rules for Nats. Exported via nats-io/prometheus-nats-exporter. These rules cover critical and warning conditions — copy and paste the YAML into your Prometheus configuration.

wget https://raw.githubusercontent.com/samber/awesome-prometheus-alerts/refs/heads/master/dist/rules/nats/nats-exporter.yml
warning

3.5.1. Nats high routes count

High number of NATS routes ({{ $value }}) for {{ $labels.instance }}

- alert: NatsHighRoutesCount
  expr: gnatsd_varz_routes > 10
  for: 3m
  labels:
    severity: warning
  annotations:
    summary: Nats high routes count (instance {{ $labels.instance }})
    description: "High number of NATS routes ({{ $value }}) for {{ $labels.instance }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

3.5.2. Nats high memory usage

NATS server memory usage is above 200MB for {{ $labels.instance }}

- alert: NatsHighMemoryUsage
  expr: gnatsd_varz_mem > 200 * 1024 * 1024
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: Nats high memory usage (instance {{ $labels.instance }})
    description: "NATS server memory usage is above 200MB for {{ $labels.instance }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

3.5.3. Nats slow consumers

There are slow consumers in NATS for {{ $labels.instance }}

- alert: NatsSlowConsumers
  expr: gnatsd_varz_slow_consumers > 0
  for: 3m
  labels:
    severity: critical
  annotations:
    summary: Nats slow consumers (instance {{ $labels.instance }})
    description: "There are slow consumers in NATS for {{ $labels.instance }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
critical

3.5.4. Nats server down

NATS server has been down for more than 5 minutes

  # Replace job="nats" with the actual job name in your Prometheus configuration.
- alert: NatsServerDown
  expr: absent(up{job="nats"})
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: Nats server down (instance {{ $labels.instance }})
    description: "NATS server has been down for more than 5 minutes\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

3.5.5. Nats high CPU usage

NATS server is using more than 80% CPU for the last 5 minutes

  # gnatsd_varz_cpu is a gauge reporting CPU percentage (0-100 scale).
- alert: NatsHighCPUUsage
  expr: gnatsd_varz_cpu > 80
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: Nats high CPU usage (instance {{ $labels.instance }})
    description: "NATS server is using more than 80% CPU for the last 5 minutes\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

3.5.6. Nats high number of connections

NATS server has more than 1000 active connections

- alert: NatsHighNumberOfConnections
  expr: gnatsd_connz_num_connections > 1000
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: Nats high number of connections (instance {{ $labels.instance }})
    description: "NATS server has more than 1000 active connections\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

3.5.7. Nats high JetStream store usage

JetStream store usage is over 80%

- alert: NatsHighJetStreamStoreUsage
  expr: gnatsd_varz_jetstream_stats_storage / gnatsd_varz_jetstream_config_max_storage > 0.8 and gnatsd_varz_jetstream_config_max_storage > 0
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: Nats high JetStream store usage (instance {{ $labels.instance }})
    description: "JetStream store usage is over 80%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

3.5.8. Nats high JetStream memory usage

JetStream memory usage is over 80%

- alert: NatsHighJetStreamMemoryUsage
  expr: gnatsd_varz_jetstream_stats_memory / gnatsd_varz_jetstream_config_max_memory > 0.8 and gnatsd_varz_jetstream_config_max_memory > 0
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: Nats high JetStream memory usage (instance {{ $labels.instance }})
    description: "JetStream memory usage is over 80%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

3.5.9. Nats high number of subscriptions

NATS server has more than 1000 active subscriptions

- alert: NatsHighNumberOfSubscriptions
  expr: gnatsd_varz_subscriptions > 1000
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: Nats high number of subscriptions (instance {{ $labels.instance }})
    description: "NATS server has more than 1000 active subscriptions\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

3.5.10. Nats high pending bytes

NATS server has more than 100,000 pending bytes

- alert: NatsHighPendingBytes
  expr: gnatsd_connz_pending_bytes > 100000
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: Nats high pending bytes (instance {{ $labels.instance }})
    description: "NATS server has more than 100,000 pending bytes\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

3.5.11. Nats too many errors

NATS server has encountered {{ $value }} JetStream API errors in the last 5 minutes

- alert: NatsTooManyErrors
  expr: increase(gnatsd_varz_jetstream_stats_api_errors[5m]) > 5
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: Nats too many errors (instance {{ $labels.instance }})
    description: "NATS server has encountered {{ $value }} JetStream API errors in the last 5 minutes\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

3.5.12. Nats JetStream accounts exceeded

JetStream has more than 100 active accounts

- alert: NatsJetStreamAccountsExceeded
  expr: sum(gnatsd_varz_jetstream_stats_accounts) > 100
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: Nats JetStream accounts exceeded (instance {{ $labels.instance }})
    description: "JetStream has more than 100 active accounts\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
warning

3.5.13. Nats leaf node connection issue

No leaf node connections on {{ $labels.instance }}

  # Only enable this alert if your deployment requires leaf node connections.
  # This will fire spuriously if leaf nodes are not configured.
- alert: NatsLeafNodeConnectionIssue
  expr: gnatsd_varz_leafnodes == 0
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: Nats leaf node connection issue (instance {{ $labels.instance }})
    description: "No leaf node connections on {{ $labels.instance }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"