如何在 Prometheus PromQL 中划分 2 个指标

Question

我正在使用 Prometheus 在 Graphana 中构建仪表板。 我有 2 个指标（对服务的总调用次数和总超时错误）

1 是对服务 PromQL 的总调用次数

(increase(Fetching_RESPONSE_TIME_seconds_count{instance="${server}:8080"}[1h])

其他是总超时 PromQL

(increase(dp_errors_total{code=~"12345",instance="${server}:8080"}[1h]))

我想在我的仪表板中再增加一列，它显示超时百分比，这将是（总超时*100/总调用服务）。

当我做这个 PromQL

(increase(dp_errors_total{code=~"12345",instance="${server}:8080"}[1h])*100
/
(increase(Fetching_RESPONSE_TIME_seconds_count{instance="${server}:8080"}[1h])

它没有向我的仪表板显示任何内容。

如何在仪表板中再添加一列来显示超时百分比？

Answer 1

当您尝试执行算术表达式时，Prometheus 将尝试匹配左右两侧的时间序列。 它通过他们拥有的标签来做到这一点。 双方必须具有相同的标签（名称和值）。 我不知道您的时间序列具有的所有标签，但我可以猜测例如code标签仅存dp_errors_total而不是第二个。 我通常会首先聚合两个操作数（根据需要），例如：

sum by (server) ( ... dp_errors_total query ) 
/
sum by (server) ( ... Fetching_RESPONSE_TIME_seconds_count query ...)

或者如果$server只有一台服务器，则删除by (server)部分。

Answer 2

默认情况下，Prometheus 对/运算符左侧和右侧具有相同标签集的时间序列对执行除法。 在我们的案例中/左侧的时间序列包含code和instance标签，而/右侧的时间序列仅包含instance label。 Prometheus 找不到匹配的时间序列对，因此根据这些规则它什么也不返回。 可以使用on()和group_left()修饰符更改此行为：

on()修饰符用于限制标签集，在搜索匹配的时间序列对时会考虑这些标签集
group_left()修饰符用于允许将/运算符左侧的多个时间序列匹配到右侧的单个时间序列。 有关更多详细信息，请参阅这些文档。

因此，生成的查询应如下所示：

100 * increase(dp_errors_total{code=~"12345",instance="${server}:8080"}[1h])
  / on(instance) group_left()
increase(Fetching_RESPONSE_TIME_seconds_count{instance="${server}:8080"}[1h])

如何在 Prometheus PromQL 中划分 2 个指标

问题描述

2 个解决方案

解决方案1
1 2020-09-03 12:43:30

解决方案2
0 2022-04-07 20:42:33

如何在 Prometheus PromQL 中划分 2 个指标

问题描述

2 个解决方案

解决方案1 1 2020-09-03 12:43:30

解决方案2 0 2022-04-07 20:42:33

解决方案1
1 2020-09-03 12:43:30

解决方案2
0 2022-04-07 20:42:33