简体   繁体   中英

VictoriaMetrics return different results from sum query and value counted from samples

I have metric from Grafana Loki named logs_bytes_over_time and have two labels:

  • interval - define average value for defined interval 1m
  • service - name of service logs belongs to

All services have some retention and app1 and app2 has retention 336 hours (14 days). When count sum over time, the result show 8gb for app1 and 5gb for app2:

curl http://victoria-metrics:8428/prometheus/api/v1/query -d 'query=sum_over_time(logs_bytes_over_time{interval="1m", service=~"app.*"}[14d])' | jq .
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "interval": "1m",
          "service": "app1"
        },
        "value": [
          1672315519,
          "8593470346"
        ]
      },
      {
        "metric": {
          "interval": "1m",
          "service": "app2"
        },
        "value": [
          1672315519,
          "5498422093"
        ]
      }
    ]
  }
}

When I get metrics value from Victoria Metrics API since last 336 hours, I received different values:

  • app1 - 42.33gb
  • app2 - 20.87gb

The result is from samples downloaded from Victoria metrics API:

curl http://victoria-metrics:8428/prometheus/api/v1/query_range -d 'query=logs_bytes_over_time{interval="1m", service=~"app.*"}' -d 'start=Xh' -d 'stop=Yh' -d 'step=1m'

Where X and Y are iterate over 24h interval to make requests easier for Victoria Metrics. This is pairs of X and YI iterate over:

[('-336h', '-312h'), ('-312h', '-288h'), ('-288h', '-264h'), ('-264h', '-240h'), ('-240h', '-216h'), ('-216h', '-192h'), ('-192h', '-168h'), ('-168h', '-144h'), ('-144h', '-120h'), ('-120h', '-96h'), ('-96h', '-72h'), ('-72h', '-48h'), ('-48h', '-24h'), ('-24h', '0')]

What I do is just sum all the values I received. I sum it with python and bash to be sure, I did not make any mistake in script, the results are the same.

Why the sum of values from Victoria API and from query sum_over_time are so different? I would expect the result should be the same, or at least much closer to each other.

The /api/v1/query_range doesn't return raw samples stored in VictoriaMetrics. It returns calculated values at timestamps t=[start, start+step, start+2*step, ..., end] . More specifically, it returns the last raw sample value on a time range (t-scrape_interval... t] per each timestamp t from the list above, where scrape_interval is the median interval between raw samples . Note that the t-scrape_interval isn't included in the time range, while t is included. See these docs for more details.

The sum_over_time(m[d]) returns the sum of raw samples on the time range (td... t] when queried at the timestamp t . See these docs for more details.

It is likely the interval between raw samples in the queried time series exceeds the step value passed to /api/v1/query_range . This results in duplicate output values per each raw sample stored in VictoriaMetrics.

VictoriaMetrics provides export APIs, which can be used for exporting raw samples for the given time series - see these docs and this article for details. Try exporting raw samples with these APIs and verifying whether the sum of raw samples matches the value returned by sum_over_time(m[d]) .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM