简体   繁体   English

如何使用配置文件使 Prometheus Alertmanager 静音?

[英]How to silence Prometheus Alertmanager using config files?

I'm using the official stable/prometheus-operator chart do deploy Prometheus with helm.我正在使用官方的stable/prometheus-operator chart 来部署带有 helm 的 Prometheus。

It's working good so far, except for the annoying CPUThrottlingHigh alert that is firing for many pods (including the own Prometheus' config-reloaders containers ).到目前为止,它运行良好,除了为许多 pod(包括自己的 Prometheus 的config-reloaders 容器)触发的烦人的CPUThrottlingHigh警报。 This alert is currently under discussion , and I want to silence its notifications for now.此警报目前正在讨论中,我想暂时将其通知静音。

The Alertmanager has a silence feature , but it is web-based: Alertmanager 具有静音功能,但它是基于网络的:

Silences are a straightforward way to simply mute alerts for a given time.静音是在给定时间内简单地将警报静音的直接方法。 Silences are configured in the web interface of the Alertmanager.静音是在 Alertmanager 的 Web 界面中配置的。

There is a way to mute notifications from CPUThrottlingHigh using a config file?有没有办法使用配置文件将来自CPUThrottlingHigh通知静音?

One option is to route alerts you want silenced to a "null" receiver.一种选择是将您想要静音的警报路由到“空”接收器。 In alertmanager.yaml :alertmanager.yaml

route:
  # Other settings...
  group_wait: 0s
  group_interval: 1m
  repeat_interval: 1h

  # Default receiver.
  receiver: "null"

  routes:
  # continue defaults to false, so the first match will end routing.
  - match:
      # This was previously named DeadMansSwitch
      alertname: Watchdog
    receiver: "null"
  - match:
      alertname: CPUThrottlingHigh
    receiver: "null"
  - receiver: "regular_alert_receiver"

receivers:
  - name: "null"
  - name: regular_alert_receiver
    <snip>

Well, I managed it to work by configuring a hackishinhibit_rule :好吧,我通过配置一个hackishprevent_rule让它工作了:

inhibit_rules:
- target_match:
     alertname: 'CPUThrottlingHigh'
  source_match:
     alertname: 'DeadMansSwitch'
  equal: ['prometheus']

The DeadMansSwitch is, by design, an "always firing" alert shipped with prometheus-operator, and the prometheus label is a common label for all alerts, so the CPUThrottlingHigh ends up inhibited forever . DeadMansSwitch设计是 prometheus-operator 附带的“始终触发”警报,而prometheus标签是所有警报的通用标签,因此CPUThrottlingHigh最终会被永远禁止 It stinks, but works.它很臭,但有效。

Pros:优点:

  • This can be done via the config file (using the alertmanager.config helm parameter).这可以通过配置文件来完成(使用alertmanager.config helm 参数)。
  • The CPUThrottlingHigh alert is still present on Prometheus for analysis. Prometheus 上仍存在CPUThrottlingHigh警报以供分析。
  • The CPUThrottlingHigh alert only shows up in the Alertmanager UI if the "Inhibited" box is checked.如果选中了“Inhibited”框, CPUThrottlingHigh警报只会显示在 Alertmanager UI 中。
  • No annoying notifications on my receivers.我的接收器上没有烦人的通知。

Cons:缺点:

  • Any changes in DeadMansSwitch or the prometheus label design will break this (which only implies the alerts firing again). DeadMansSwitchprometheus标签设计中的任何更改都将破坏这一点(这仅意味着警报再次触发)。

Update: My Cons became real...更新:我的缺点变成了现实......

The DeadMansSwitch altertname just changed in the stable/prometheus-operator 4.0.0. DeadMansSwitch名称刚刚在 stable/prometheus-operator 4.0.0 中更改 If using this version (or above), the new alertname is Watchdog .如果使用这个版本(或更高版本),新的警报名称是Watchdog

I doubt there exists a way to silence alerts via configuration (other than routing said alerts to a /dev/null receiver, ie one with no email or any other notification mechanism configured, but the alert would still show up in the Alertmanager UI).我怀疑是否存在通过配置使警报静音的方法(除了将所述警报路由到/dev/null接收器,即没有配置电子邮件或任何其他通知机制的接收器,但警报仍会显示在 Alertmanager UI 中)。

You can apparently use the command line tool amtool that comes with alertmanager to add a silence (although I can't see a way to set an expiration time for the silence).您显然可以使用amtool附带的命令行工具amtool添加静音(尽管我看不到设置静音过期时间的方法)。

Or you can use the API directly (even though it is not documented and in theory it may change).或者您可以直接使用 API(即使它没有记录并且理论上它可能会改变)。 According to this prometheus-users thread this should work:根据这个 prometheus-users 线程,这应该可以工作:

curl https://alertmanager/api/v1/silences -d '{
      "matchers": [
        {
          "name": "alername1",
          "value": ".*",
          "isRegex": true
        }
      ],
      "startsAt": "2018-10-25T22:12:33.533330795Z",
      "endsAt": "2018-10-25T23:11:44.603Z",
      "createdBy": "api",
      "comment": "Silence",
      "status": {
        "state": "active"
      }

}'

You can silence it by sending your alerts through Robusta .您可以通过Robusta发送警报来使其静音。 (Disclaimer: I wrote Robusta.) (免责声明:我写了罗布斯塔。)

Here is an example:下面是一个例子:

- triggers:
  - on_prometheus_alert: {}
  actions:
  - name_silencer:
      names: ["Watchdog", "CPUThrottlingHigh"]

However, this is probably not what you want to do!但是,这可能不是您想要做的!

Some CPUThrottlingHigh alerts are spammy and can't be fixed like the one for metrics-server on GKE.一些CPUThrottlingHigh警报是垃圾邮件,无法像GKE 上的指标服务器那样修复 . .

However, in general the alert is meaningful and can indicate a real problem.但是,通常警报是有意义的,可以指示真正的问题。 Typically the best-practice is to change or remove the pod's CPU limit. 通常,最佳实践是更改或删除 pod 的 CPU 限制。 . .

I've spent more hours of my life than I care to admit looking at CPUThrottlingHigh as I wrote an automated playbook for Robusta which analyzes each CPUThrottlingHigh and recommends the best practice.在我为 Robusta 编写自动剧本时,我花了比我愿意承认的更多时间来查看CPUThrottlingHigh ,该剧本分析每个CPUThrottlingHigh并推荐最佳实践。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM