简体   繁体   中英

Prometheus blackbox probe helpful metrics

I have around 1000 targets that are probed using HTTP.

job="http_2xx", env="prod", instance="x.x.x.x"
job="http_2xx", env="test", instance="y.y.y.y"
job="http_2xx", env="dev", instance="z.z.z.z"

I want to know for the targets:

  1. Rate of failure by env in last 10 minutes.
  2. Increase in rate of failure by env in last 10 minutes.
  3. Curious what the following does:
sum(increase(probe_success{job="http_2xx"}[10m]))

rate(probe_success{job="http_2xx", env="prod"}[5m]) * 100

The closest I have reached is with following to find operational by env in 10 minutes:

avg(avg_over_time(probe_success{job="http_2xx", env="prod"}[10m]) * 100)
  1. Rate of failure by env in last 10 minutes. The easiest way you can do it is:

    sum(rate(probe_success{job="http_2xx"}[10m]) * 100) by (env)

    This will return you the percentage off successful probes, which you can reverse adding *(-1) +100

  2. Calculating rate over 10m and increase of rate over 10m seems redundant adding an increase function to the above query didn't work for me. you can replace the rate function with increase if want to.

  3. The first query was pretty close it will calculate the increase of successful probes over 10m period. You can make it show increase of failed probes by adding == 0 and sum it by the "env" variable

    sum(increase(probe_success{job="http_2xx"} == 0 [10m])) by (env)

    Your second query will return percentage of successful request over 5m for prod environment

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM