[英]Prometheus AlertManager - Send Alerts to different clients based on routes
I have 2 services A and B which I want to monitor.我有 2 个要监控的服务A和B。 Also I have 2 different notification channels X and Y in the form of
receivers
in the AlertManager config file.此外,我在 AlertManager 配置文件中以
receivers
的形式有 2 个不同的通知通道X和Y。
I want to send to notify X if service A goes down and want to notify Y if service B goes down.我想送,服务A出现故障,并想服务B下降到通知Ÿ通知X。 How can I achieve this my configuration?
我怎样才能实现这个我的配置?
My AlertManager YAML file is:我的 AlertManager YAML 文件是:
route:
receiver: X
receivers:
- name: X
email_configs:
- name: Y
email_configs:
And alert.rule
files is: alert.rule
文件是:
groups:
- name: A
rules:
- alert: A_down
expr: expression
for: 1m
labels:
severity: critical
annotations:
summary: "A is down"
- name: B
rules:
- alert: B_down
expr: expression
for: 1m
labels:
severity: warning
annotations:
summary: "B is down"
The config should roughly look like this (not tested):配置应该大致如下(未测试):
route:
group_wait: 30s
group_interval: 5m
repeat_interval: 2h
receiver: 'default-receiver'
routes:
- match:
alertname: A_down
receiver: X
- match:
alertname: B_down
receiver: Y
The idea is, that each route
field can has a routes
field, where you can put a different config, that gets enabled if the labels in match
match the condition.这个想法是,每个
route
字段都可以有一个routes
字段,您可以在其中放置不同的配置,如果match
中的标签与条件匹配,则启用该字段。
For clarifying - The General Flow to handle alert in Prometheus (Alertmanager and Prometheus integration) is like this:为了澄清 - 在 Prometheus 中处理警报的一般流程(Alertmanager 和 Prometheus 集成)是这样的:
SomeErrorHappenInYourConfiguredRule( Rule ) -> RouteToDestination( Route ) -> TriggeringAnEvent( Reciever )-> GetAMessageInSlack/PagerDuty/Mail/etc...
SomeErrorHappenInYourConfiguredRule( Rule ) -> RouteToDestination( Route ) -> TriggeringAnEvent( Reciever )-> GetAMessageInSlack/PagerDuty/Mail/etc...
For example:例如:
if my aws machine cluster production-a1 is down, I want to trigger an event sending "pagerDuty" and "Slack" to my team with the relevant error.
如果我的 aws 机器集群 production-a1 关闭,我想触发一个事件,向我的团队发送“pagerDuty”和“Slack”,并显示相关错误。
There's 3 files important to configure alerts on your prometheus system:有 3 个文件对于在 prometheus 系统上配置警报很重要:
I'm attaching a Dummy example In order to demonstrate the idea, in this example I'll watch overload in my machine (using node exporter installed on it): On /var/data/prometheus-stack/alertmanager/ alertmanager.yml我附上一个虚拟示例为了演示这个想法,在这个示例中,我将观察我的机器中的过载(使用安装在其上的节点导出器):在/var/data/prometheus-stack/alertmanager/alertmanager.yml
global:
# The smarthost and SMTP sender used for mail notifications.
smtp_smarthost: 'localhost:25'
smtp_from: 'JohnDoe@gmail.com'
route:
receiver: defaultTrigger
group_wait: 30s
group_interval: 5m
repeat_interval: 6h
routes:
- match_re:
service: service_overload
owner: ATeam
receiver: pagerDutyTrigger
receivers:
- name: 'pagerDutyTrigger'
pagerduty_configs:
- send_resolved: true
routing_key: <myPagerDutyToken>
Add some rule On /var/data/prometheus-stack/prometheus/ yourRuleFile.yml在 /var/data/prometheus-stack/prometheus/ yourRuleFile.yml 上添加一些规则
groups:
- name: alerts
rules:
- alert: service_overload_more_than_5000
expr: (node_network_receive_bytes_total{job="someJobOrService"} / 1000) >= 5000
for: 10m
labels:
service: service_overload
severity: pager
dev_team: myteam
annotations:
dev_team: myteam
priority: Blocker
identifier: '{{ $labels.name }}'
description: 'service overflow'
value: '{{ humanize $value }}%'
On /var/data/prometheus-stack/prometheus/ prometheus.yml add this snippet to integrate alertmanager:放在/ var /数据/普罗米修斯堆栈/普罗米修斯/ prometheus.yml添加此片段整合alertmanager:
global:
...
alerting:
alertmanagers:
- scheme: http
static_configs:
- targets:
- "alertmanager:9093"
rule_files:
- "yourRuleFile.yml"
...
Pay attention that the key point of this example is service_overload which connects and binds the rule into the right receiver.注意这个例子的关键点是service_overload ,它将规则连接并绑定到正确的接收器中。
Reload the config (restart the service again or stop and start your docker containers) and test it, if it's configured well you can watch the alerts in http://your-prometheus-url:9090/alerts
重新加载配置(再次重新启动服务或停止并启动您的 docker 容器)并对其进行测试,如果配置良好,您可以在
http://your-prometheus-url:9090/alerts
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.