What is the Prometheus alert rule for "CloudWatch exporter scrape error"?

CloudWatch exporter on {{ $labels.instance }} failed to scrape metrics from AWS CloudWatch API. PromQL expression: cloudwatch_exporter_scrape_error > 0. Severity: warning. Duration: 5m.

What is the Prometheus alert rule for "CloudWatch API high request rate"?

CloudWatch exporter on {{ $labels.instance }} is making {{ $value }} API calls per minute to namespace {{ $labels.namespace }}. This can lead to high AWS costs. PromQL expression: sum by (instance, namespace) (rate(cloudwatch_requests_total[5m])) * 60 > 100. Severity: warning.

What is the Prometheus alert rule for "AWS EC2 high CPU utilization"?

EC2 instance {{ $labels.instance_id }} CPU utilization is above 90% ({{ $value }}%). PromQL expression: aws_ec2_cpuutilization_average > 90. Severity: warning. Duration: 15m.

What is the Prometheus alert rule for "AWS RDS low free storage space"?

RDS instance {{ $labels.dbinstance_identifier }} has less than 2GB free storage ({{ $value }} bytes remaining). PromQL expression: aws_rds_free_storage_space_average < 2000000000. Severity: warning. Duration: 5m.

What is the Prometheus alert rule for "AWS RDS high CPU utilization"?

RDS instance {{ $labels.dbinstance_identifier }} CPU utilization is above 90% ({{ $value }}%). PromQL expression: aws_rds_cpuutilization_average > 90. Severity: warning. Duration: 15m.

What is the Prometheus alert rule for "AWS RDS high database connections"?

RDS instance {{ $labels.dbinstance_identifier }} has {{ $value }} active connections. PromQL expression: aws_rds_database_connections_average > 100. Severity: warning. Duration: 5m.

What is the Prometheus alert rule for "AWS SQS queue messages visible"?

SQS queue {{ $labels.queue_name }} has {{ $value }} messages waiting to be processed. PromQL expression: aws_sqs_approximate_number_of_messages_visible_average > 1000. Severity: warning. Duration: 10m.

What is the Prometheus alert rule for "AWS SQS message age too old"?

SQS queue {{ $labels.queue_name }} has messages older than 1 hour ({{ $value }}s). PromQL expression: aws_sqs_approximate_age_of_oldest_message_maximum > 3600. Severity: warning.

What is the Prometheus alert rule for "AWS ALB unhealthy targets"?

ALB {{ $labels.load_balancer }} has {{ $value }} unhealthy target(s) in target group {{ $labels.target_group }}. PromQL expression: aws_applicationelb_unhealthy_host_count_average > 0. Severity: critical. Duration: 5m.

What is the Prometheus alert rule for "AWS ALB high target response time"?

ALB {{ $labels.load_balancer }} average target response time is above 2 seconds ({{ $value }}s). PromQL expression: aws_applicationelb_target_response_time_average > 2. Severity: warning. Duration: 5m.

What is the Prometheus alert rule for "AWS Lambda high error rate"?

Lambda function {{ $labels.function_name }} error rate is above 5% ({{ $value }}%). PromQL expression: (aws_lambda_errors_sum / aws_lambda_invocations_sum) * 100 > 5 and aws_lambda_invocations_sum > 0. Severity: warning. Duration: 5m.

AWS CloudWatch Prometheus Alert Rules

13 Prometheus alerting rules for AWS CloudWatch. Exported via prometheus/cloudwatch_exporter. These rules cover critical and warning conditions — copy and paste the YAML into your Prometheus configuration.

⚠️

Alert thresholds depend on the nature of your applications. Some queries may have arbitrary tolerance thresholds. Building an efficient monitoring platform takes time. 😉

groups:
- name: PrometheusCloudwatchExporter
  rules:
    - alert: CloudWatchExporterScrapeError
      expr: cloudwatch_exporter_scrape_error > 0
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: CloudWatch exporter scrape error (instance {{ $labels.instance }})
        description: "CloudWatch exporter on {{ $labels.instance }} failed to scrape metrics from AWS CloudWatch API.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
    - alert: CloudWatchExporterSlowScrape
      expr: cloudwatch_exporter_scrape_duration_seconds > 300
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: CloudWatch exporter slow scrape (instance {{ $labels.instance }})
        description: "CloudWatch exporter on {{ $labels.instance }} scrape is taking more than 5 minutes ({{ $value }}s). Consider reducing the number of metrics or splitting across multiple exporters.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
      # CloudWatch API calls cost money (~$0.01 per 1000 GetMetricData requests).
      # 100 requests/minute ≈ $45/month. Adjust the threshold based on your budget.
    - alert: CloudWatchAPIHighRequestRate
      expr: sum by (instance, namespace) (rate(cloudwatch_requests_total[5m])) * 60 > 100
      for: 0m
      labels:
        severity: warning
      annotations:
        summary: CloudWatch API high request rate (instance {{ $labels.instance }})
        description: "CloudWatch exporter on {{ $labels.instance }} is making {{ $value }} API calls per minute to namespace {{ $labels.namespace }}. This can lead to high AWS costs.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
      # Requires EC2 CPUUtilization metric configured in the CloudWatch exporter.
    - alert: AWSEC2HighCPUUtilization
      expr: aws_ec2_cpuutilization_average > 90
      for: 15m
      labels:
        severity: warning
      annotations:
        summary: AWS EC2 high CPU utilization (instance {{ $labels.instance }})
        description: "EC2 instance {{ $labels.instance_id }} CPU utilization is above 90% ({{ $value }}%).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
      # Requires RDS FreeStorageSpace metric. The threshold of 2GB is a rough default.
      # Adjust based on your database size.
    - alert: AWSRDSLowFreeStorageSpace
      expr: aws_rds_free_storage_space_average < 2000000000
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: AWS RDS low free storage space (instance {{ $labels.instance }})
        description: "RDS instance {{ $labels.dbinstance_identifier }} has less than 2GB free storage ({{ $value }} bytes remaining).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
      # Requires RDS CPUUtilization metric configured in the CloudWatch exporter.
    - alert: AWSRDSHighCPUUtilization
      expr: aws_rds_cpuutilization_average > 90
      for: 15m
      labels:
        severity: warning
      annotations:
        summary: AWS RDS high CPU utilization (instance {{ $labels.instance }})
        description: "RDS instance {{ $labels.dbinstance_identifier }} CPU utilization is above 90% ({{ $value }}%).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
      # The threshold depends on the RDS instance class. Adjust based on your
      # instance type's max_connections parameter.
    - alert: AWSRDSHighDatabaseConnections
      expr: aws_rds_database_connections_average > 100
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: AWS RDS high database connections (instance {{ $labels.instance }})
        description: "RDS instance {{ $labels.dbinstance_identifier }} has {{ $value }} active connections.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
      # Requires SQS ApproximateNumberOfMessagesVisible metric. The threshold of 1000
      # is a rough default. Adjust based on your expected queue depth.
    - alert: AWSSQSQueueMessagesVisible
      expr: aws_sqs_approximate_number_of_messages_visible_average > 1000
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: AWS SQS queue messages visible (instance {{ $labels.instance }})
        description: "SQS queue {{ $labels.queue_name }} has {{ $value }} messages waiting to be processed.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
      # Requires SQS ApproximateAgeOfOldestMessage metric.
    - alert: AWSSQSMessageAgeTooOld
      expr: aws_sqs_approximate_age_of_oldest_message_maximum > 3600
      for: 0m
      labels:
        severity: warning
      annotations:
        summary: AWS SQS message age too old (instance {{ $labels.instance }})
        description: "SQS queue {{ $labels.queue_name }} has messages older than 1 hour ({{ $value }}s).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
      # Requires ApplicationELB UnHealthyHostCount metric.
    - alert: AWSALBUnhealthyTargets
      expr: aws_applicationelb_unhealthy_host_count_average > 0
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: AWS ALB unhealthy targets (instance {{ $labels.instance }})
        description: "ALB {{ $labels.load_balancer }} has {{ $value }} unhealthy target(s) in target group {{ $labels.target_group }}.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
      # Requires ApplicationELB HTTPCode_ELB_5XX_Count and RequestCount metrics.
    - alert: AWSALBHigh5xxErrorRate
      expr: (aws_applicationelb_httpcode_elb_5_xx_count_sum / aws_applicationelb_request_count_sum) * 100 > 5 and aws_applicationelb_request_count_sum > 0
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: AWS ALB high 5xx error rate (instance {{ $labels.instance }})
        description: "ALB {{ $labels.load_balancer }} 5xx error rate is above 5% ({{ $value }}%).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
      # Requires ApplicationELB TargetResponseTime metric.
    - alert: AWSALBHighTargetResponseTime
      expr: aws_applicationelb_target_response_time_average > 2
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: AWS ALB high target response time (instance {{ $labels.instance }})
        description: "ALB {{ $labels.load_balancer }} average target response time is above 2 seconds ({{ $value }}s).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
    
      # Requires Lambda Errors and Invocations metrics.
    - alert: AWSLambdaHighErrorRate
      expr: (aws_lambda_errors_sum / aws_lambda_invocations_sum) * 100 > 5 and aws_lambda_invocations_sum > 0
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: AWS Lambda high error rate (instance {{ $labels.instance }})
        description: "Lambda function {{ $labels.function_name }} error rate is above 5% ({{ $value }}%).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

11.1. prometheus/cloudwatch_exporter (13 rules)

CloudWatch metrics are exported as aws_{namespace}_{metric_name}_{statistic} gauges.
The rules below cover both exporter health and common AWS service alerts.
Adjust thresholds and label filters to match your CloudWatch exporter configuration.

wget https://raw.githubusercontent.com/samber/awesome-prometheus-alerts/refs/heads/master/dist/rules/aws-cloudwatch/prometheus-cloudwatch-exporter.yml

warning

11.1.1. CloudWatch exporter scrape error

CloudWatch exporter on {{ $labels.instance }} failed to scrape metrics from AWS CloudWatch API.

- alert: CloudWatchExporterScrapeError
  expr: cloudwatch_exporter_scrape_error > 0
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: CloudWatch exporter scrape error (instance {{ $labels.instance }})
    description: "CloudWatch exporter on {{ $labels.instance }} failed to scrape metrics from AWS CloudWatch API.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

warning

11.1.2. CloudWatch exporter slow scrape

CloudWatch exporter on {{ $labels.instance }} scrape is taking more than 5 minutes ({{ $value }}s). Consider reducing the number of metrics or splitting across multiple exporters.

- alert: CloudWatchExporterSlowScrape
  expr: cloudwatch_exporter_scrape_duration_seconds > 300
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: CloudWatch exporter slow scrape (instance {{ $labels.instance }})
    description: "CloudWatch exporter on {{ $labels.instance }} scrape is taking more than 5 minutes ({{ $value }}s). Consider reducing the number of metrics or splitting across multiple exporters.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

warning

11.1.3. CloudWatch API high request rate

CloudWatch exporter on {{ $labels.instance }} is making {{ $value }} API calls per minute to namespace {{ $labels.namespace }}. This can lead to high AWS costs.

  # CloudWatch API calls cost money (~$0.01 per 1000 GetMetricData requests).
  # 100 requests/minute ≈ $45/month. Adjust the threshold based on your budget.
- alert: CloudWatchAPIHighRequestRate
  expr: sum by (instance, namespace) (rate(cloudwatch_requests_total[5m])) * 60 > 100
  for: 0m
  labels:
    severity: warning
  annotations:
    summary: CloudWatch API high request rate (instance {{ $labels.instance }})
    description: "CloudWatch exporter on {{ $labels.instance }} is making {{ $value }} API calls per minute to namespace {{ $labels.namespace }}. This can lead to high AWS costs.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

warning

11.1.4. AWS EC2 high CPU utilization

EC2 instance {{ $labels.instance_id }} CPU utilization is above 90% ({{ $value }}%).

  # Requires EC2 CPUUtilization metric configured in the CloudWatch exporter.
- alert: AWSEC2HighCPUUtilization
  expr: aws_ec2_cpuutilization_average > 90
  for: 15m
  labels:
    severity: warning
  annotations:
    summary: AWS EC2 high CPU utilization (instance {{ $labels.instance }})
    description: "EC2 instance {{ $labels.instance_id }} CPU utilization is above 90% ({{ $value }}%).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

warning

11.1.5. AWS RDS low free storage space

RDS instance {{ $labels.dbinstance_identifier }} has less than 2GB free storage ({{ $value }} bytes remaining).

  # Requires RDS FreeStorageSpace metric. The threshold of 2GB is a rough default.
  # Adjust based on your database size.
- alert: AWSRDSLowFreeStorageSpace
  expr: aws_rds_free_storage_space_average < 2000000000
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: AWS RDS low free storage space (instance {{ $labels.instance }})
    description: "RDS instance {{ $labels.dbinstance_identifier }} has less than 2GB free storage ({{ $value }} bytes remaining).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

warning

11.1.6. AWS RDS high CPU utilization

RDS instance {{ $labels.dbinstance_identifier }} CPU utilization is above 90% ({{ $value }}%).

  # Requires RDS CPUUtilization metric configured in the CloudWatch exporter.
- alert: AWSRDSHighCPUUtilization
  expr: aws_rds_cpuutilization_average > 90
  for: 15m
  labels:
    severity: warning
  annotations:
    summary: AWS RDS high CPU utilization (instance {{ $labels.instance }})
    description: "RDS instance {{ $labels.dbinstance_identifier }} CPU utilization is above 90% ({{ $value }}%).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

warning

11.1.7. AWS RDS high database connections

RDS instance {{ $labels.dbinstance_identifier }} has {{ $value }} active connections.

  # The threshold depends on the RDS instance class. Adjust based on your
  # instance type's max_connections parameter.
- alert: AWSRDSHighDatabaseConnections
  expr: aws_rds_database_connections_average > 100
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: AWS RDS high database connections (instance {{ $labels.instance }})
    description: "RDS instance {{ $labels.dbinstance_identifier }} has {{ $value }} active connections.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

warning

11.1.8. AWS SQS queue messages visible

SQS queue {{ $labels.queue_name }} has {{ $value }} messages waiting to be processed.

  # Requires SQS ApproximateNumberOfMessagesVisible metric. The threshold of 1000
  # is a rough default. Adjust based on your expected queue depth.
- alert: AWSSQSQueueMessagesVisible
  expr: aws_sqs_approximate_number_of_messages_visible_average > 1000
  for: 10m
  labels:
    severity: warning
  annotations:
    summary: AWS SQS queue messages visible (instance {{ $labels.instance }})
    description: "SQS queue {{ $labels.queue_name }} has {{ $value }} messages waiting to be processed.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

warning

11.1.9. AWS SQS message age too old

SQS queue {{ $labels.queue_name }} has messages older than 1 hour ({{ $value }}s).

  # Requires SQS ApproximateAgeOfOldestMessage metric.
- alert: AWSSQSMessageAgeTooOld
  expr: aws_sqs_approximate_age_of_oldest_message_maximum > 3600
  for: 0m
  labels:
    severity: warning
  annotations:
    summary: AWS SQS message age too old (instance {{ $labels.instance }})
    description: "SQS queue {{ $labels.queue_name }} has messages older than 1 hour ({{ $value }}s).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

critical

11.1.10. AWS ALB unhealthy targets

ALB {{ $labels.load_balancer }} has {{ $value }} unhealthy target(s) in target group {{ $labels.target_group }}.

  # Requires ApplicationELB UnHealthyHostCount metric.
- alert: AWSALBUnhealthyTargets
  expr: aws_applicationelb_unhealthy_host_count_average > 0
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: AWS ALB unhealthy targets (instance {{ $labels.instance }})
    description: "ALB {{ $labels.load_balancer }} has {{ $value }} unhealthy target(s) in target group {{ $labels.target_group }}.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

critical

11.1.11. AWS ALB high 5xx error rate

ALB {{ $labels.load_balancer }} 5xx error rate is above 5% ({{ $value }}%).

  # Requires ApplicationELB HTTPCode_ELB_5XX_Count and RequestCount metrics.
- alert: AWSALBHigh5xxErrorRate
  expr: (aws_applicationelb_httpcode_elb_5_xx_count_sum / aws_applicationelb_request_count_sum) * 100 > 5 and aws_applicationelb_request_count_sum > 0
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: AWS ALB high 5xx error rate (instance {{ $labels.instance }})
    description: "ALB {{ $labels.load_balancer }} 5xx error rate is above 5% ({{ $value }}%).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

warning

11.1.12. AWS ALB high target response time

ALB {{ $labels.load_balancer }} average target response time is above 2 seconds ({{ $value }}s).

  # Requires ApplicationELB TargetResponseTime metric.
- alert: AWSALBHighTargetResponseTime
  expr: aws_applicationelb_target_response_time_average > 2
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: AWS ALB high target response time (instance {{ $labels.instance }})
    description: "ALB {{ $labels.load_balancer }} average target response time is above 2 seconds ({{ $value }}s).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

warning

11.1.13. AWS Lambda high error rate

Lambda function {{ $labels.function_name }} error rate is above 5% ({{ $value }}%).

  # Requires Lambda Errors and Invocations metrics.
- alert: AWSLambdaHighErrorRate
  expr: (aws_lambda_errors_sum / aws_lambda_invocations_sum) * 100 > 5 and aws_lambda_invocations_sum > 0
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: AWS Lambda high error rate (instance {{ $labels.instance }})
    description: "Lambda function {{ $labels.function_name }} error rate is above 5% ({{ $value }}%).\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

More in Cloud providers

Google Cloud Stackdriver DigitalOcean Azure