简体   繁体   English

如何在 Prometheus PromQL 中划分 2 个指标

[英]How to divide 2 metrics in Prometheus PromQL

I am building my dashboard in Graphana using Prometheus.我正在使用 Prometheus 在 Graphana 中构建仪表板。 I have 2 metrics (Total calls to a service and total timeout errors)我有 2 个指标(对服务的总调用次数和总超时错误)

1 is total calls to a service PromQL 1 是对服务 PromQL 的总调用次数

(increase(Fetching_RESPONSE_TIME_seconds_count{instance="${server}:8080"}[1h]) 

other is total timeout PromQL其他是总超时 PromQL

(increase(dp_errors_total{code=~"12345",instance="${server}:8080"}[1h]))

I want to have one more column in my dashboard which shows percentage timeout which would be (total timeout*100/total calls to service).我想在我的仪表板中再增加一列,它显示超时百分比,这将是(总超时*100/总调用服务)。

when I do this PromQL当我做这个 PromQL

(increase(dp_errors_total{code=~"12345",instance="${server}:8080"}[1h])*100
/
(increase(Fetching_RESPONSE_TIME_seconds_count{instance="${server}:8080"}[1h])

It does not show anything to my dashboard.它没有向我的仪表板显示任何内容。

How can I add one more column to my dashboard which would show percentage timeouts?如何在仪表板中再添加一列来显示超时百分比?

When you try to do an arithmetic expression Prometheus will try to match time series on the left and right side.当您尝试执行算术表达式时,Prometheus 将尝试匹配左右两侧的时间序列。 It does it by labels they have.它通过他们拥有的标签来做到这一点。 Both sides have to have the same labels (names and values).双方必须具有相同的标签(名称和值)。 I don't know all the labels your time series have but I can guess that for example code label is only present on only dp_errors_total and not in the second one.我不知道您的时间序列具有的所有标签,但我可以猜测例如code标签仅存dp_errors_total而不是第二个。 I'd typically aggregate both operands first (by what is needed), for example:我通常会首先聚合两个操作数(根据需要),例如:

sum by (server) ( ... dp_errors_total query ) 
/
sum by (server) ( ... Fetching_RESPONSE_TIME_seconds_count query ...)

or if there is only one server in $server then drop the by (server) part.或者如果$server只有一台服务器,则删除by (server)部分。

By default Prometheus performs the division for pairs of time series with identical sets of labels on the left and the right side of / operator.默认情况下,Prometheus 对/运算符左侧和右侧具有相同标签集的时间序列对执行除法。 In our case time series on the left side of / contain code and instance labels, while time series on the right side of / contain only instance label.在我们的案例中/左侧的时间序列包含codeinstance标签,而/右侧的时间序列仅包含instance label。 Prometheus cannot find matching pairs of time series, so it returns nothing according to these rules . Prometheus 找不到匹配的时间序列对,因此根据这些规则它什么也不返回。 This behavior can be changed with on() and group_left() modifiers:可以使用on()group_left()修饰符更改此行为:

  • the on() modifier is used for limiting the set of labels, which are taken into account during searching for matching time series pairs on()修饰符用于限制标签集,在搜索匹配的时间序列对时会考虑这些标签集
  • the group_left() modifier is used for allowing matching multiple time series on the left side of / operator to a single time series on the right side. group_left()修饰符用于允许将/运算符左侧的多个时间序列匹配到右侧的单个时间序列。 See these docs for more details.有关更多详细信息,请参阅这些文档

So the resulting query should look like the following:因此,生成的查询应如下所示:

100 * increase(dp_errors_total{code=~"12345",instance="${server}:8080"}[1h])
  / on(instance) group_left()
increase(Fetching_RESPONSE_TIME_seconds_count{instance="${server}:8080"}[1h])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM