简体   繁体   中英

How can rate() function average request duration?

Quote from Prometheus Count and sum of observations doc:

To calculate the average request duration during the last 5 minutes from a histogram or summary called http_request_duration_seconds, use the following expression: rate(http_request_duration_seconds_sum[5m]) / rate(http_request_duration_seconds_count[5m])

I should mention that I understand:

  1. what rate function does
  2. how instant vector is different from range vector
  3. that if I wanted to obtain average request duration increase rate I would definitely use that expression

However I'm not interested in the increase rate of request duration, but rather in the request duration itself!

Can someone explain why everybody while looking for average count/value in any given moment of time has to use a rate function, when it doesn't provide that?

PS there's seemingly a duplicate question with a checked answer , however all answers in it explain what rate function is, how it does what it does, etc. I already understand what rate function does. I just don't understand why we are supposed to use it in first place, especially when the result that it provides has nothing to do with what we're looking for.

Let's show that the formula quoted from the Prometheus manual, making use of the function named rate() , computes the exact value you are looking for.

According to the way a counter works, we know that each time the counter named http_request_duration_seconds_sum takes into account a new value, that is the sum of durations of all the requests that happened from the last time, it adds this sum to its previous value. Therefore, rate(http_request_duration_seconds_sum[5m]) is the sum of the durations of the requests that occurred during 5 minutes, divided by 5 minutes.

And each time the counter http_request_duration_seconds_count takes into account a new value, that is the number of requests that happened from the last time, this counter adds this number of requests to its previous value. Therefore, rate(http_request_duration_seconds_count[5m]) is the number of requests that occurred during 5 minutes, divided by 5 minutes.

So, let's inject the formulas discovered in the two previous paragraphs into the following fraction:

速率(http_request_duration_seconds_sum[5m])/速率(http_request_duration_seconds_count[5m])

equals to:

(((5 分钟内发生的请求的持续时间总和)/5 分钟)/((5 分钟内发生的请求数)/5 分钟)

You can simplify this formula by removing 5 minutes , because it is present in the numerator and in the denominator.

Finally, the following formula:

速率(http_request_duration_seconds_sum[5m])/速率(http_request_duration_seconds_count[5m])

is equal to the following one:

(5分钟内发生的请求时长之和)/(5分钟内发生的请求数)

The second part of this equality is the value you want to compute: the average duration of requests during 5 minutes. This is why it is computed using the first part of this equality.

Prometheus summary and histogram metric types expose two additional counters:

  • The total count of the particular measurements since the service start. This counter is constructed by adding _count suffix to the original metric name. For example, if the original historgam or summary metric name is http_request_duration_seconds (see docs for metric naming convention in Prometheus ), then the the http_request_duration_seconds_count counter is generated, which counts the total number of http requests since the service start .
  • The total sum of all the measurements since the service start. This metric is constructed by adding _sum suffix to the original metric name. For example, if the original metric name is http_request_duration_seconds , then the http_request_duration_seconds_sum counter contains the total sum of all the http request durations since the service start .

How to calculate the average request duration from these two metrics? An obvious solution is to divide sum of all the request durations by the number of requests:

http_request_duration_seconds_sum / http_request_duration_seconds_total

But this solution shows the average request duration since the last restart of the service . Usually users are interested in an average request duration over some lookbehind interval. For example, over the last 5 minutes. Then we need do divide the sum of all the request durations during the last 5 minutes by the number of requests served during the last 5 minutes. This can be done with increase function:

increase(http_request_duration_seconds_sum[5m])
  /
increase(http_request_duration_seconds_count[5m])

The rate function in Prometheus is calculated as rate(m[d]) = increase(m[d])/d , eg this is increase() divided by the lookbehind window d . Let's substitute increase with rate in the query above:

rate(http_request_duration_seconds_sum[5m])
  /
rate(http_request_duration_seconds_count[5m])

Now let's substitute rate(m[d]) with increase(m[d])/d according to the formula above:

(increase(http_request_duration_seconds_sum[5m])/5m)
  /
(increase(http_request_duration_seconds_count[5m])/5m)

The 5m denominators can be collapsed, so we end up with the initial query with increase() . So it is OK to use either rate or increase in the query above - this shouldn't change the result.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM