OpenSearch Alert Monitoring: High CPU Usage Example

[post-views]
November 29, 2024 · 3 min read
OpenSearch Alert Monitoring: High CPU Usage Example

OpenSearch alerting feature sends notifications when data from one or more indices meets certain customizable conditions. Use cases include monitoring for HTTP status code 503, detecting CPU load averages above a specific threshold, or tracking the count of specific keywords in logs over defined intervals. Notifications can be configured to be sent via email, Slack, custom webhooks, and other destinations. In this example, we demonstrate a monitor using Slack as the notification destination to alert users of high CPU usage.

Example Monitor JSON
Below is a JSON configuration for a monitor named “High CPU Monitor”. This monitor checks cluster statistics for high CPU usage and sends a Slack notification when conditions are met.

Key Configuration Elements:

  1. Name and Type:
    • name: Identifies the monitor as “High CPU Monitor”.
    • monitor_type: Specifies the type as a cluster metrics monitor.
  2. Schedule:
    • The monitor runs every 10 minutes ("interval": 10) to assess cluster performance.
  3. Inputs:
    • Uses the _cluster/stats API to fetch cluster metrics.
  4. Trigger:
    • Named “High CPU Usage”.
    • Evaluates the condition ctx.results[0].nodes.process.cpu.percent >= 80 using a Painless script.
    • If the condition is true, the trigger activates.
  5. Actions:
    • Sends a notification via Slack.
    • Includes customizable templates for the alert message and subject.
    • Throttling ensures alerts are not sent more frequently than every 15 minutes.
Full Example:
{
   "name": "High CPU Monitor",
   "type": "monitor",
   "monitor_type": "cluster_metrics_monitor",
   "enabled": true,
   "schedule": {
      "period": {
         "unit": "MINUTES",
         "interval": 10
      }
   },
   "inputs": [
      {
         "uri": {
            "api_type": "CLUSTER_STATS",
            "path": "_cluster/stats",
            "path_params": "",
            "url": "http://localhost:9200/_cluster/stats"
         }
      }
   ],
   "triggers": [
      {
         "query_level_trigger": {
            "id": "Haj6uIsB7n52Hz6Dk-tm",
            "name": "High CPU Usage",
            "severity": "1",
            "condition": {
               "script": {
                  "source": "ctx.results[0].nodes.process.cpu.percent >= 80",
                  "lang": "painless"
               }
            },
            "actions": [
               {
                  "id": "notification827565",
                  "name": "High CPU Usage",
                  "destination_id": "",
                  "message_template": {
                     "source": "Monitor {{ctx.monitor.name}} just entered alert status. Please investigate the issue.\n  - Trigger: {{ctx.trigger.name}}\n  - Severity: {{ctx.trigger.severity}}\n  - Period start: {{ctx.periodStart}}\n  - Period end: {{ctx.periodEnd}}",
                     "lang": "mustache"
                  },
                  "throttle_enabled": true,
                  "subject_template": {
                     "source": "CPU usage breached 80%!",
                     "lang": "mustache"
                  },
                  "throttle": {
                     "value": 15,
                     "unit": "MINUTES"
                  }
               }
            ]
         }
      }
   ],
   "ui_metadata": {
      "schedule": {
         "timezone": null,
         "frequency": "interval",
         "period": {
            "unit": "MINUTES",
            "interval": 10
         },
         "daily": 0,
         "weekly": {
            "tue": false,
            "wed": false,
            "thur": false,
            "sat": false,
            "fri": false,
            "mon": false,
            "sun": false
         },
         "monthly": {
            "type": "day",
            "day": 1
         },
         "cronExpression": "0 */1 * * *"
      },
      "monitor_type": "cluster_metrics_monitor",
      "search": {
         "searchType": "clusterMetrics",
         "timeField": "",
         "aggregations": [],
         "groupBy": [],
         "bucketValue": 1,
         "bucketUnitOfTime": "h",
         "filters": []
      }
   }
}

Save the JSON below into cpu_alert.json

Use curl to create the alert:

curl -XPOST \
https://username:password@opensearch-project.example.com:9200/_plugins/_alerting/monitors \
-H 'Content-type: application/json' -T cpu_alert.json

Was this article helpful?

Like and share it with your peers.
Join SOC Prime's Detection as Code platform to improve visibility into threats most relevant to your business. To help you get started and drive immediate value, book a meeting now with SOC Prime experts.

Related Posts