[英]Prometheus / Grafana count a downtime of service
I have a service metric that returns either some positive value, or 0 in case of failure.我有一个服务指标,它返回一些正值,或者在失败的情况下返回 0。 I want to count how many seconds my service was failing during some time period.
我想计算我的服务在某个时间段内失败的秒数。
Eg the expression:例如表达式:
service_metric_name == 0
gives me a dashed line in Grafana:在 Grafana 中给我一条虚线:
line_of_downtime line_of_downtime
Is there any way to count how many seconds my service was down for the last 2 hours?有没有办法计算我的服务在过去 2 小时内关闭了多少秒?
I assume the service is either 0 for being down or 1 for being up.我假设该服务是 0 表示关闭或 1 表示启动。
In this case you can calculate an average over a time range.在这种情况下,您可以计算一个时间范围内的平均值。 If the result is 0.9, your service has been up for 90% of the time.
如果结果为 0.9,则您的服务已运行 90%。 If you calculated the average over an hour, this would have been 6 minutes downtime out of 60 minutes.
如果您计算一个小时内的平均值,这将是 60 分钟中的 6 分钟停机时间。
avg_over_time(up{service_metric_name[1h])
This will be a moving average, that is: when your service is down, the value will slowly decrease.这将是一个移动平均线,即:当您的服务宕机时,该值会缓慢下降。 Then your service is up, it will slowly increase again.
然后你的服务就起来了,它会再次慢慢增加。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.