简体   繁体   English

如何在 k8s pod 重启时对 prometheus 计数器求和

[英]How to sum prometheus counters when k8s pods restart

I'm running Prometheus in a kube.netes cluster.我在 kube.netes 集群中运行 Prometheus。 All is running find and my UI pods are counting visitors.一切都在运行 find 并且我的 UI pod 正在计算访问者。

在此处输入图像描述

Please ignore the title, what you see here is the query at the bottom of the image.请忽略标题,您在这里看到的是图片底部的查询。 It's a counter.这是一个柜台。 The gaps in the graph are due to pods restarting.图表中的差距是由于 pod 重新启动造成的。 I have two pods running simultaneously!我有两个 pod 同时运行!

Now suppose I would like to count the total of visitors, so I need to sum over all the pods现在假设我想计算访问者总数,所以我需要对所有 pod 求和

在此处输入图像描述

This is what I expect considering the first image, right?考虑到第一张图片,这就是我所期望的,对吗?

However, I don't want the graph to drop when a pod restarts.但是,我不希望图表在 pod 重新启动时丢失。 I would like to have something cumulative over a specified amount of time (somehow ignoring pods restarting).我想在指定的时间内累积一些东西(以某种方式忽略 pod 重新启动)。 Hope this makes any sense.希望这是有道理的。 Any suggestions?有什么建议么?

UPDATE更新

Below is suggested to do the following下面建议做以下事情

在此处输入图像描述

Its a bit hard to see because I've plotted everything there, but the suggested answer sum(rate(NumberOfVisitors[1h])) * 3600 is the continues green line there.它有点难以看到,因为我已经在那里绘制了所有内容,但建议的答案sum(rate(NumberOfVisitors[1h])) * 3600是那里的连续绿线。 What I don't understand now is the value of 3 it has?我现在不明白的是它有3个值? Also why does the value increase after 21:55, because I can see some values before that.还有为什么21:55之后数值会增加,因为我可以看到之前的一些数值。

As the approach seems to be ok, I noticed that the actual increase is actually 3, going from 1 to 4. In the graph below I've used just one time series to reduce noise由于该方法似乎没问题,我注意到实际增加实际上是 3,从 1 到 4。在下图中,我只使用了一个时间序列来减少噪声

在此处输入图像描述

Rate, then sum, then multiply by the time range in seconds.率,然后求和,然后乘以以秒为单位的时间范围。 That will handle rollovers on counters too.这也将处理柜台上的翻转。

Prometheus doesn't provide the ability to sum counters, which may be reset. Prometheus 不提供对计数器求和的功能,这可能会被重置。 Additionally, the increase() function in Prometheus has some issues, which may prevent from using it for querying counter increase over the specified time range:此外,Prometheus 中的increase() function 存在一些问题,可能无法使用它来查询指定时间范围内的计数器增加:

  • It may return fractional values over integer counters because of extrapolation.由于外推,它可能会返回超过 integer 个计数器的小数值。 See this issue for details.有关详细信息,请参阅此问题
  • It may miss counter increase between raw sample just before the lookbehind window in square brackets and the first raw sample inside the lookbehind window. For example, increase(NumberOfVisitors[1m]) at timestamp t may miss the counter increase between the last raw sample just before the t-1m time and the first raw sample at (t-1m... t] time range. See more details here and here .它可能会错过方括号中 lookbehind window 之前的原始样本与 lookbehind window 内的第一个原始样本之间的计数器增加。例如,在时间戳t increase(NumberOfVisitors[1m])可能会错过最后一个原始样本之间的计数器增加在t-1m时间和(t-1m... t]时间范围内的第一个原始样本之前。请在此处此处查看更多详细信息。
  • It may miss the increase for the first raw sample in a time series.它可能会错过时间序列中第一个原始样本的增加。 For example, if the NumberOfVisitors counter is increased to 10 just before the first scrape of this counter by Prometheus, then increase() over the time range with the first sample would under-count the counter increase by 10.例如,如果NumberOfVisitors计数器在 Prometheus 第一次刮取该计数器之前增加到 10,那么在第一个样本的时间范围内的increase()将少计计数器增加 10。

Prometheus developers are going to fix these issues - see this design doc . Prometheus 开发人员将解决这些问题 - 请参阅此设计文档 In the mean time it is possible to use VictoriaMetrics - its' increase() function is free from these issues.同时,可以使用VictoriaMetrics - 它的increase() function 没有这些问题。

Returning to the original question - the sum of multiple counters, which may be reset, can be returned with the following MetricsQL query in VictoriaMetrics:回到最初的问题 - 多个计数器的总和,可能会被重置,可以在 VictoriaMetrics 中使用以下MetricsQL查询返回:

running_sum(sum(increase(NumberOfVisitor)))

It uses the following functions:它使用以下功能:

  • increase() for calculating increase per each counter between adjacent points on the graph. increase()用于计算图表上相邻点之间每个计数器的增加。
  • sum() for summing the calculated increases per each point on the graph. sum()用于对图表上每个点的计算增量求和。
  • running_sum() for calculating the running sum over per-point increases on the graph. running_sum()用于计算图形上每个点增加的运行总和。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM