简体   繁体   English

Prometheus Alertmanager 是否能够区分事件和条件?

[英]Is Prometheus Alertmanager able to discern between event and condition?

We have a kubernetes system that among other activities handling thousands of incoming inputs from sensors.我们有一个 kubernetes 系统,除其他活动外,该系统处理来自传感器的数千个输入输入。 Some sensors can stop reporting from time to time, so we can have an alert about the event of disconnection.一些传感器可能会不时停止报告,因此我们可以对断开事件发出警报。 When sensor is back we would like also to get an event for this as well.当传感器回来时,我们也希望为此获得一个事件。 So, between these events (connection and disconnection) the status of a specific sensor can be OK or NOK and we would like to see the status of currently disconnected sensors without going over all the issued events and finding out each time.因此,在这些事件(连接和断开连接)之间,特定传感器的状态可以是 OK 或 NOK,我们希望查看当前断开连接的传感器的状态,而不需要检查所有已发布的事件并每次都找出来。

Can we do that with Prometheus Alertmanager?我们可以用 Prometheus Alertmanager 做到这一点吗? If yes, can you please refer to the possible ways to accomplish this?如果是,您能否参考可能的方法来完成此操作? If no, what will be your default way to handle this requirement?如果不是,您处理此要求的默认方式是什么?

This has to be managed at Prometheus Server side by adding self-monitoring alerts, and more precisely the PrometheusTargetMissing alert for your case这必须在 Prometheus 服务器端通过添加自我监控警报进行管理,更准确地说是针对您的案例的 PrometheusTargetMissing 警报

  - alert: PrometheusTargetMissing
    expr: up == 0
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: Prometheus target missing (instance {{ $labels.instance }})
      description: A Prometheus target has disappeared. An exporter might be crashed.\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}

Reference: https://awesome-prometheus-alerts.grep.to/rules.html#rule-prometheus-self-monitoring-2参考: https://awesome-prometheus-alerts.grep.to/rules.html#rule-prometheus-self-monitoring-2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM