Skip to content

Send Alerts Implementing a Custom Producer

As part of the MONIT infrastructure we offer the possibility to inject alert type of documents, depending on the integration way, these documents can be treated as real alerts and being forwarded to notification endpoints (GNI), or just stored in the MONIT infrastructure as a way to visualize them later like metrics.

Please refer to the first step section to know what you should do before using any of the integrations.

Supported Producers

There are several ways to integrate with MONIT/GNI depending on the producer, here's a list of the supported ones:

JSON

Similar to other type of documents, we offer the possibility to submit alerts using an HTTP+JSON endpoint in order to make use of the GNI integration and generation of notifications, this is for example the endpoint used to integrate Collectd alerts from Puppet.

Storages

Check the GNI storages section for more information.

Send data

Send your alarms to the HTTP endpoint listening in http://monit-alarms.cern.ch:10011, the available JSON fields are:

  • (mandatory) timestamp: the timestamp of the alarm (check #1 below)
  • (mandatory) source: the name of your dataset/producer
  • (mandatory) alarm_name: the name of the alarm
  • (mandatory) entities: the entity being alarmed, such as hosts, services, etc
  • (mandatory) status: the alarm state, possible values: OK, WARNING, FAILURE (check #2 below)
  • (optional) metric: the name of the metric that generated the alarm
  • (optional) summary: a short summary of the alarm; if provided it is used as the email/ticket subject
  • (optional) description: a more verbose description of the alarm
  • (optional) correlation: the condition that was met to generate the alarm
  • (optional) troubleshooting: A link providing troubleshooting details
  • (optional) targets: An array with information about where to send the alarm, possible values: snow, email and sms e.g: ["snow"]
  • (optional) snow/service_element: the SNOW SE of the incident
  • (optional) snow/functional_element: the SNOW FE of the incident
  • (optional) snow/assignment_level: the SNOW assignment level of the incident (as an int)
  • (optional) snow/grouping: (deprecated, use hostgroup_grouping) if the incident should be grouped by hostgroup
  • (optional) snow/hostgroup_grouping: if the incident should be grouped by hostgroup
  • (optional) snow/auto_closing: if grouping is disabled, OK alerts automatically close tickets
  • (optional) snow/notification_type: the alarm type, possible values: app (default), hw, os
  • (optional) snow/watchlist: String representing a comma separated list of emails that will form the watchlist of the SNOW incident e.g: "email1,email2,..."
  • (optional) email/to: the recipient(s) of the email
  • (optional) email/cc: the address(es) to include as CC
  • (optional) email/bcc: the address(es) to include as BCC
  • (optional) email/reply_to: the email to reply to
  • (optional) email/send_ok: if enabled, emails will be sent upon receiving OK alarms; defaults to false
  • (optional) sms/to: the recipient of the text message in international format (restrained to CERN numbers, i.e. +4175411xxxx)
  • (optional) sms/send_ok: if enabled, text messages will be sent upon receiving OK alarms; defaults to false

Please pay attention to the following:

  1. All timestamps must be in UTC milliseconds or seconds, without any subdecimal part
  2. FAILURE status can also be send as CRITICAL or ERROR and will internally be converted
  3. Use double quotes and not single quote (not valid in JSON)
  4. Anything that is considered metadata for the infrastructure will be promoted to the metadata field in the produced JSON, and the rest will be put inside data
  5. The alarm must be a JSON document with the fields above.
  6. When possible please send multiple documents in the same batch grouping them in a JSON array.

Prometheus Alertmanager

Alerting with Prometheus is separated into two parts. Alerting rules in Prometheus servers send alerts to an Alertmanager. The Alertmanager then manages those alerts, including silencing, inhibition, aggregation and sending out notifications via methods such as email, on-call notification systems, and chat platforms.

When sending Prometheus alerts, there's the possibility not to integrate with GNI and just send the alert document to be stored in Opensearch.

Storages

When sending data using Prometheus alertmanager documents will be stored OpenSearch, and translated into two different index patterns: * monit_prod_alertmanager_raw_alerts: This will contain the full JSON representation of the alert as provided by Prometheus Alertmanager * monit_prod_alarm_raw_gni: If integration with GNI has been made this will contain the GNI representation of your alert

Send data

The alerts produced by Prometheus and sent by the alertmanager should contain the labels as described by the schema, please note that configuring gni_entities will trigger GNI integration and will on top of being inserter raw in Opensearch, generate one/several GNI documents.

  • (mandatory/provided by alertmanager by default) timestamp: the timestamp of the alarm (check #1 below)
  • (mandatory/provided by alertmanager by default) alarm_name: the name of the alarm
  • (mandatory/provided by alertmanager by default) status: the alarm state, possible values:
    • "resolved" -> OK
    • "firing" -> FAILURE
    • "pending" -> WARNING
    • else -> UNKNOWN

GNI integration

If you want your alerts to be handled by GNI infrastructure, there's a list of otherwise optional parameters that you can set.

  • (mandatory) gni_entities: String, comma separated list of entities triggering the alarm (e.g: "host1,host2") -> entities in GNI schema
  • (optional) submitter_environment: environment from where the alarm was fired
  • (optional) submitter_hostgroup: hostgroup from where the alarm was fired
  • (optional) dashboardURL: A link providing troubleshooting details -> troubleshooting in GNI schema
  • (optional) roger_alarmed: if the alarm is masked by roger or not

Since dynamic labels will create new instances of your alert, for dynamic values like "correlation" you should actually add them as a custom annotations:

  • (optional) gni_summary: a short summary of the alarm -> summary in GNI schema
  • (optional) gni_description: a more verbose description of the alarm -> description in GNI schema
  • (optional) gni_correlation: a representation of the threshold that triggered the alarm (i.e: "25 > 12") -> correlation in GNI schema
SNOW labels

In the GNI schema, all these labels will be nested under the "snow" field by removing the snow_ prefix.

  • (optional) snow_service_element: the SNOW SE of the incident
  • (optional) snow_functional_element: the SNOW FE of the incident
  • (optional) snow_assignment_level: the SNOW assignment level of the incident (as an int)
  • (optional) snow_hostgroup_grouping: if the incident should be grouped by hostgroup
  • (optional) snow_auto_closing: if grouping is disabled, OK alerts automatically close tickets
  • (optional) snow_notification_type: the alarm type, possible values: app (default), hw, os
  • (optional) snow_watchlist: list of emails that will form the watchlist of the SNOW incident
  • (optional) snow_troubleshooting: used historically by nocontacts, will be override by "troubleshooting" if specified

You can also integrate the SNOW information using annotations and wrapping all of them in a single Object, please note that if available in labels those will be used instead:

annotations:
  snow:
    functional_element:
    service_element:
    ...
Email labels

In the GNI schema, all these labels will be nested under the "snow" field by removing the email_ prefix.

  • (optional) email_to: the recipient(s) of the email
  • (optional) email_cc: the address(es) to include as CC
  • (optional) email_bcc: the address(es) to include as BCC
  • (optional) email_reply_to: the email to reply to
  • (optional) email_send_ok: if enabled, emails will be sent upon receiving OK alarms; defaults to false

You can also integrate the Email information using annotations and wrapping all of them in a single Object, please note that if available in labels those will be used instead:

annotations:
  email:
    to:
    bcc:
    ...
SMS labels

In the GNI schema, all these labels will be nested under the "snow" field by removing the snow_ prefix.

  • (optional) sms_to: the recipient of the text message in international format (restrained to CERN numbers, i.e. +4175411xxxx)
  • (optional) sms_send_ok: if enabled, text messages will be sent upon receiving OK alarms; defaults to false

You can also integrate the SMS information using annotations and wrapping all of them in a single Object, please note that if available in labels those will be used instead:

annotations:
  sms:
    to:
    send_ok:
    ...

Configuration

You will need to configure a new receiver with the endpoint to integrate alerts with the MONIT infrastructure (http://monit-alarms.cern.ch:10016) and specify the "Monit-Producer" header, here's a working example:

global:
  resolve_timeout: 5m
route:
  receiver: default-receiver
  group_by:
  - alertname
  - service
  - gni_entities
  routes:
  - match:
      gni_entities: ".+"
    receiver: gni
    continue: true
  group_wait: 1m
  group_interval: 5m
receivers:
- name: default-receiver
# MONIT GNI receiver
- name: "gni"
  webhook_configs:
    - url: "http://monit-alarms.cern.ch:10016"
      http_config:
        http_headers:
          X-Monit-Source:
            values: [<tenant>]
      send_resolved: true