如何使用配置文件使 Prometheus Alertmanager 静音？

Question

I'm using the official stable/prometheus-operator chart do deploy Prometheus with helm.我正在使用官方的stable/prometheus-operator chart 来部署带有 helm 的 Prometheus。

It's working good so far, except for the annoying CPUThrottlingHigh alert that is firing for many pods (including the own Prometheus' config-reloaders containers ).到目前为止，它运行良好，除了为许多 pod（包括自己的 Prometheus 的config-reloaders 容器）触发的烦人的CPUThrottlingHigh警报。 This alert is currently under discussion , and I want to silence its notifications for now.此警报目前正在讨论中，我想暂时将其通知静音。

The Alertmanager has a silence feature , but it is web-based: Alertmanager 具有静音功能，但它是基于网络的：

Silences are a straightforward way to simply mute alerts for a given time.静音是在给定时间内简单地将警报静音的直接方法。 Silences are configured in the web interface of the Alertmanager.静音是在 Alertmanager 的 Web 界面中配置的。

There is a way to mute notifications from CPUThrottlingHigh using a config file?有没有办法使用配置文件将来自CPUThrottlingHigh通知静音？

Answer 1

One option is to route alerts you want silenced to a "null" receiver.一种选择是将您想要静音的警报路由到“空”接收器。 In alertmanager.yaml :在alertmanager.yaml ：

route:
  # Other settings...
  group_wait: 0s
  group_interval: 1m
  repeat_interval: 1h

  # Default receiver.
  receiver: "null"

  routes:
  # continue defaults to false, so the first match will end routing.
  - match:
      # This was previously named DeadMansSwitch
      alertname: Watchdog
    receiver: "null"
  - match:
      alertname: CPUThrottlingHigh
    receiver: "null"
  - receiver: "regular_alert_receiver"

receivers:
  - name: "null"
  - name: regular_alert_receiver
    <snip>

Answer 2

Well, I managed it to work by configuring a hackishinhibit_rule :好吧，我通过配置一个hackishprevent_rule让它工作了：

inhibit_rules:
- target_match:
     alertname: 'CPUThrottlingHigh'
  source_match:
     alertname: 'DeadMansSwitch'
  equal: ['prometheus']

The DeadMansSwitch is, by design, an "always firing" alert shipped with prometheus-operator, and the prometheus label is a common label for all alerts, so the CPUThrottlingHigh ends up inhibited forever . DeadMansSwitch设计是 prometheus-operator 附带的“始终触发”警报，而prometheus标签是所有警报的通用标签，因此CPUThrottlingHigh最终会被永远禁止。 It stinks, but works.它很臭，但有效。

Pros:优点：

This can be done via the config file (using the alertmanager.config helm parameter).这可以通过配置文件来完成（使用alertmanager.config helm 参数）。
The CPUThrottlingHigh alert is still present on Prometheus for analysis. Prometheus 上仍存在CPUThrottlingHigh警报以供分析。
The CPUThrottlingHigh alert only shows up in the Alertmanager UI if the "Inhibited" box is checked.如果选中了“Inhibited”框， CPUThrottlingHigh警报只会显示在 Alertmanager UI 中。
No annoying notifications on my receivers.我的接收器上没有烦人的通知。

Cons:缺点：

Any changes in DeadMansSwitch or the prometheus label design will break this (which only implies the alerts firing again). DeadMansSwitch或prometheus标签设计中的任何更改都将破坏这一点（这仅意味着警报再次触发）。

Update: My Cons became real...更新：我的缺点变成了现实......

The DeadMansSwitch altertname just changed in the stable/prometheus-operator 4.0.0. DeadMansSwitch名称刚刚在 stable/prometheus-operator 4.0.0 中更改。 If using this version (or above), the new alertname is Watchdog .如果使用这个版本（或更高版本），新的警报名称是Watchdog 。

Answer 3

I doubt there exists a way to silence alerts via configuration (other than routing said alerts to a /dev/null receiver, ie one with no email or any other notification mechanism configured, but the alert would still show up in the Alertmanager UI).我怀疑是否存在通过配置使警报静音的方法（除了将所述警报路由到/dev/null接收器，即没有配置电子邮件或任何其他通知机制的接收器，但警报仍会显示在 Alertmanager UI 中）。

You can apparently use the command line tool amtool that comes with alertmanager to add a silence (although I can't see a way to set an expiration time for the silence).您显然可以使用amtool附带的命令行工具amtool添加静音（尽管我看不到设置静音过期时间的方法）。

Or you can use the API directly (even though it is not documented and in theory it may change).或者您可以直接使用 API（即使它没有记录并且理论上它可能会改变）。 According to this prometheus-users thread this should work:根据这个 prometheus-users 线程，这应该可以工作：

curl https://alertmanager/api/v1/silences -d '{
      "matchers": [
        {
          "name": "alername1",
          "value": ".*",
          "isRegex": true
        }
      ],
      "startsAt": "2018-10-25T22:12:33.533330795Z",
      "endsAt": "2018-10-25T23:11:44.603Z",
      "createdBy": "api",
      "comment": "Silence",
      "status": {
        "state": "active"
      }

}'

Answer 4

You can silence it by sending your alerts through Robusta .您可以通过Robusta发送警报来使其静音。 (Disclaimer: I wrote Robusta.) （免责声明：我写了罗布斯塔。）

Here is an example:下面是一个例子：

- triggers:
  - on_prometheus_alert: {}
  actions:
  - name_silencer:
      names: ["Watchdog", "CPUThrottlingHigh"]

However, this is probably not what you want to do!但是，这可能不是您想要做的！

Some CPUThrottlingHigh alerts are spammy and can't be fixed like the one for metrics-server on GKE.一些CPUThrottlingHigh警报是垃圾邮件，无法像GKE 上的指标服务器那样修复。 . .

However, in general the alert is meaningful and can indicate a real problem.但是，通常警报是有意义的，可以指示真正的问题。 Typically the best-practice is to change or remove the pod's CPU limit. 通常，最佳实践是更改或删除 pod 的 CPU 限制。 . .

I've spent more hours of my life than I care to admit looking at CPUThrottlingHigh as I wrote an automated playbook for Robusta which analyzes each CPUThrottlingHigh and recommends the best practice.在我为 Robusta 编写自动剧本时，我花了比我愿意承认的更多时间来查看CPUThrottlingHigh ，该剧本分析每个CPUThrottlingHigh并推荐最佳实践。

如何使用配置文件使 Prometheus Alertmanager 静音？

问题描述

4 个解决方案

解决方案1
14 2019-03-05 06:35:40

解决方案2
11 已采纳 2019-02-21 18:37:09

解决方案3
5 2019-02-21 15:17:19

解决方案4
0 2021-12-28 11:09:52

如何使用配置文件使 Prometheus Alertmanager 静音？

问题描述

4 个解决方案

解决方案1 14 2019-03-05 06:35:40

解决方案2 11 已采纳 2019-02-21 18:37:09

解决方案3 5 2019-02-21 15:17:19

解决方案4 0 2021-12-28 11:09:52

解决方案1
14 2019-03-05 06:35:40

解决方案2
11 已采纳 2019-02-21 18:37:09

解决方案3
5 2019-02-21 15:17:19

解决方案4
0 2021-12-28 11:09:52