Prometheus AlertManager - 根据路由向不同客户端发送警报

Question

I have 2 services A and B which I want to monitor.我有 2 个要监控的服务A和B。 Also I have 2 different notification channels X and Y in the form of receivers in the AlertManager config file.此外，我在 AlertManager 配置文件中以receivers的形式有 2 个不同的通知通道X和Y。

I want to send to notify X if service A goes down and want to notify Y if service B goes down.我想送，服务A出现故障，并想服务B下降到通知Ÿ通知X。 How can I achieve this my configuration?我怎样才能实现这个我的配置？

My AlertManager YAML file is:我的 AlertManager YAML 文件是：

route:
  receiver: X

receivers:
  - name: X
    email_configs:

  - name: Y
    email_configs:

And alert.rule files is: alert.rule文件是：

groups:

- name: A
  rules:
    - alert: A_down
      expr: expression
      for: 1m
      labels:
         severity: critical
      annotations:
         summary: "A is down"

- name: B
  rules:
    - alert: B_down
      expr: expression
      for: 1m
      labels:
        severity: warning
      annotations:
        summary: "B is down"

Answer 1

The config should roughly look like this (not tested):配置应该大致如下（未测试）：

route:
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 2h

  receiver: 'default-receiver'

  routes:
  - match:
      alertname: A_down
    receiver: X
  - match:
      alertname: B_down
    receiver: Y

The idea is, that each route field can has a routes field, where you can put a different config, that gets enabled if the labels in match match the condition.这个想法是，每个route字段都可以有一个routes字段，您可以在其中放置不同的配置，如果match中的标签与条件匹配，则启用该字段。

Answer 2

For clarifying - The General Flow to handle alert in Prometheus (Alertmanager and Prometheus integration) is like this:为了澄清 - 在 Prometheus 中处理警报的一般流程（Alertmanager 和 Prometheus 集成）是这样的：

SomeErrorHappenInYourConfiguredRule( Rule ) -> RouteToDestination( Route ) -> TriggeringAnEvent( Reciever )-> GetAMessageInSlack/PagerDuty/Mail/etc... SomeErrorHappenInYourConfiguredRule( Rule ) -> RouteToDestination( Route ) -> TriggeringAnEvent( Reciever )-> GetAMessageInSlack/PagerDuty/Mail/etc...

For example:例如：

if my aws machine cluster production-a1 is down, I want to trigger an event sending "pagerDuty" and "Slack" to my team with the relevant error.如果我的 aws 机器集群 production-a1 关闭，我想触发一个事件，向我的团队发送“pagerDuty”和“Slack”，并显示相关错误。

There's 3 files important to configure alerts on your prometheus system:有 3 个文件对于在 prometheus 系统上配置警报很重要：

alertmanager.yml - configuration of you routes (getting the triggered errors) and receivers (how to handle this errors) alertmanager.yml - 配置路由（获取触发的错误）和接收器（如何处理此错误）
rules.yml - This rules will contain all the thresholds and rules you'll define in your system. rules.yml - 此规则将包含您将在系统中定义的所有阈值和规则。
prometheus.yml - global configuration to integrate your rules into routes and recivers together (the two above). prometheus.yml - 将您的规则集成到路由和接收器中的全局配置（上面两个）。

I'm attaching a Dummy example In order to demonstrate the idea, in this example I'll watch overload in my machine (using node exporter installed on it): On /var/data/prometheus-stack/alertmanager/ alertmanager.yml我附上一个虚拟示例为了演示这个想法，在这个示例中，我将观察我的机器中的过载（使用安装在其上的节点导出器）：在/var/data/prometheus-stack/alertmanager/alertmanager.yml

global:
  # The smarthost and SMTP sender used for mail notifications.
  smtp_smarthost: 'localhost:25'
  smtp_from: 'JohnDoe@gmail.com'

route:
  receiver: defaultTrigger
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 6h
  routes:
  - match_re:
      service: service_overload
      owner: ATeam
    receiver: pagerDutyTrigger

receivers:
- name: 'pagerDutyTrigger'
  pagerduty_configs:
  - send_resolved: true
    routing_key: <myPagerDutyToken>

Add some rule On /var/data/prometheus-stack/prometheus/ yourRuleFile.yml在 /var/data/prometheus-stack/prometheus/ yourRuleFile.yml 上添加一些规则

groups:
- name: alerts
  rules:
  - alert: service_overload_more_than_5000
    expr: (node_network_receive_bytes_total{job="someJobOrService"} / 1000) >= 5000
    for: 10m
    labels:
      service: service_overload
      severity: pager
      dev_team: myteam
    annotations:
      dev_team: myteam
      priority: Blocker
      identifier: '{{ $labels.name }}'
      description: 'service overflow'
      value: '{{ humanize $value }}%'

On /var/data/prometheus-stack/prometheus/ prometheus.yml add this snippet to integrate alertmanager:放在/ var /数据/普罗米修斯堆栈/普罗米修斯/ prometheus.yml添加此片段整合alertmanager：

global:

...

alerting:
  alertmanagers:
  - scheme: http
    static_configs:
    - targets:
      - "alertmanager:9093"

rule_files:
  - "yourRuleFile.yml"

...

Pay attention that the key point of this example is service_overload which connects and binds the rule into the right receiver.注意这个例子的关键点是service_overload ，它将规则连接并绑定到正确的接收器中。

Reload the config (restart the service again or stop and start your docker containers) and test it, if it's configured well you can watch the alerts in http://your-prometheus-url:9090/alerts重新加载配置（再次重新启动服务或停止并启动您的 docker 容器）并对其进行测试，如果配置良好，您可以在http://your-prometheus-url:9090/alerts

Prometheus AlertManager - 根据路由向不同客户端发送警报

问题描述

2 个解决方案

解决方案1
8 已采纳 2018-07-23 19:27:19

解决方案2
3 2019-10-03 15:03:39

Prometheus AlertManager - 根据路由向不同客户端发送警报

问题描述

2 个解决方案

解决方案1 8 已采纳 2018-07-23 19:27:19

解决方案2 3 2019-10-03 15:03:39

解决方案1
8 已采纳 2018-07-23 19:27:19

解决方案2
3 2019-10-03 15:03:39