since the request_duration
is just a counter, why we need to using the rate
to calculate the duration, this is not meaningful.
histogram_quantile(0.99, sum by (le) (rate(server_request_duration_seconds_bucket[1m])))
suck as take example from: https://robert-scherbarth.medium.com/measure-request-duration-with-prometheus-and-golang-adc6f4ca05fe
As stated by Prometheus documentation :
rate(v range-vector) calculates the per-second average rate of increase of the time series in the range vector.
[...]
rate should only be used with counters. It is best suited for alerting, and for graphing of slow-moving counters.
It is only useful to get the "pace" or the frequency of evolution of a counter.
Example use case : get the requests per seconds rate based on the incoming request counter
The server_request_duration_seconds
is a histogram . It consists of multiple buckets with the name server_request_duration_seconds_bucket
(the _bucket
suffix is added to the original histogram name) with the upper boundary encoded in le
label. Each such a bucket represents a counter, which counts the number of samples with values up to le
. For example, server_request_duration_seconds_bucket{le="0.5"}
counts the number of requests with the duration up to 0.5 seconds.
The rate(server_request_duration_seconds_bucket[1m])
calculates the average per-second increase rate over the last minute individually per each bucket of server_request_duration_seconds
histogram. Eg the end result of rate(...)
is a distribution of the increase rate of all the buckets over the last minute. This histogram can be exposed at multiple instances (aka replicas or shards) of a single service. So, if you want calculating the aggregate quantile over all these instances, you need to wrap the rate()
into sum() by (le)
before passing it to histogram_quantile
.
The end result of the histogram_quantile(0.9, sum(rate(server_request_duration_seconds_bucket[1m])) by (le))
is an estimated 99th percentile of server_request_duration_seconds
histograms over the last minute, eg the maximum time in seconds needed for 99% of registered requests over the last minute.
Note that it is OK to use increase
instead of rate
when calculating the histogram_quantile
- this shouldn't change the result, since increase
returns the same distribution shape across buckets as rate
.
PS rate
and increase
functions in Prometheus may return unexpected results because of extrapolation - see this issue . This may lead to less accurate results from histogram_quantile
. If you experience this issue, then try VictoriaMetrics - this is a Prometheus-like monitoring system, which supports PromQL functionality via MetricsQL . Contrary to Prometheus, it doesn't use extrapolation for increase
and rate
calculations, so it is free from issues related to the extrapolation. Prometheus developers are going to fix these issues too - see this design doc .
PPS I'm the core developer of VictoriaMetrics.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.