Prometheus / Grafana 计算服务停机时间

Question

I have a service metric that returns either some positive value, or 0 in case of failure.我有一个服务指标，它返回一些正值，或者在失败的情况下返回 0。 I want to count how many seconds my service was failing during some time period.我想计算我的服务在某个时间段内失败的秒数。

Eg the expression:例如表达式：

service_metric_name == 0

gives me a dashed line in Grafana:在 Grafana 中给我一条虚线：

line_of_downtime line_of_downtime

Is there any way to count how many seconds my service was down for the last 2 hours?有没有办法计算我的服务在过去 2 小时内关闭了多少秒？

Answer 1

I assume the service is either 0 for being down or 1 for being up.我假设该服务是 0 表示关闭或 1 表示启动。

In this case you can calculate an average over a time range.在这种情况下，您可以计算一个时间范围内的平均值。 If the result is 0.9, your service has been up for 90% of the time.如果结果为 0.9，则您的服务已运行 90%。 If you calculated the average over an hour, this would have been 6 minutes downtime out of 60 minutes.如果您计算一个小时内的平均值，这将是 60 分钟中的 6 分钟停机时间。

avg_over_time(up{service_metric_name[1h])

This will be a moving average, that is: when your service is down, the value will slowly decrease.这将是一个移动平均线，即：当您的服务宕机时，该值会缓慢下降。 Then your service is up, it will slowly increase again.然后你的服务就起来了，它会再次慢慢增加。

Prometheus / Grafana 计算服务停机时间

问题描述

1 个解决方案

解决方案1
0 2019-12-28 16:23:10

Prometheus / Grafana 计算服务停机时间

问题描述

1 个解决方案

解决方案1 0 2019-12-28 16:23:10

解决方案1
0 2019-12-28 16:23:10