简体   繁体   English

汇总多台服务器的收集的CPU状态,这些服务器之间的CPU数量不同

[英]Summarize CollectD CPU-Stats for multiple servers with different CPU count across servers

I'm trying to forge a graph that shows worst-case CPU usage across a variable set of servers. 我正在尝试创建一个图表,该图表显示一组可变服务器上最坏情况的CPU使用情况。 I'm getting the data from collectd, which reports statistics for each CPU core separately. 我从收集的数据中获取数据,该数据分别报告每个CPU内核的统计信息。 The problem is that servers within the set may have different amounts of CPU cores. 问题在于该集中的服务器可能具有不同数量的CPU核心。

What I had so far (one series for each cpu-foo property): sumSeriesWithWildcards(sumSeriesWithWildcards(summarize(servers.$foo.$bar.*.collectd.cpu-*.cpu-system.value, '$timeframe', 'max', false), 5), 3) 到目前为止,我所拥有的(每个cpu-foo属性一个系列): sumSeriesWithWildcards(sumSeriesWithWildcards(summarize(servers.$foo.$bar.*.collectd.cpu-*.cpu-system.value, '$timeframe', 'max', false), 5), 3)

This skews the graph towards cpu-idle, obviously, because the servers are for the most part evenly loaded, so servers with more CPU cores show a higher idle ratio than servers with less cores. 显然,这会使图表偏向cpu-idle,因为服务器大多数情况下负载均匀,因此具有更多CPU内核的服务器的空闲率要高于具有更少内核的服务器。

To clarify this: I'd like to summarize all cpu-* series sums of each server to the max across all servers, except for idle, which I'd like to summarize to the min. 为了澄清这一点:我想将所有服务器的所有cpu- *系列总和汇总到所有服务器的最大值,空闲状态除外,我想总结一下。 Because of that I need a way to normalize each servers sums to 100% before summarizing them. 因此,我需要一种在汇总服务器之前将每个服务器总和标准化为100%的方法。

So far I have come to this, which is a little bit better: divideSeries(sumSeriesWithWildcards(sumSeriesWithWildcards(summarize(servers.$foo.$bar.*.collectd.cpu-*.cpu-system.value, '$timeframe', 'max', false), 5), 3), #L) 到目前为止,我来了,这要好一些: divideSeries(sumSeriesWithWildcards(sumSeriesWithWildcards(summarize(servers.$foo.$bar.*.collectd.cpu-*.cpu-system.value, '$timeframe', 'max', false), 5), 3), #L)

However, this still isn't satisfactory. 但是,这仍然不能令人满意。 It's not as skewed but it still does not fulfill the purpose of this graph: To show worst case CPU usage across servers. 它没有偏斜,但仍然不能满足该图的目的:显示最坏情况下服务器之间的CPU使用率。

What I'd need to do but can't figure out how to do it is the following: 我需要做的但不能弄清楚该怎么做的是以下内容:

  1. for each in segment 3 (server), count cpu-*, then 对于第3段(服务器)中的每个,计数cpu- *,然后
  2. sum each cpu-*.foo for this server and divide it by the count from 1. 对该服务器的每个cpu-*。foo求和,然后将其除以1中的计数。
  3. sum each from 2. and summarize 从2中求和

What's missing to me is step 2. Basically, I need a way to normalize the different CPU values for each server before summing them for all. 我缺少的是第2步。基本上,我需要一种在将所有服务器的CPU值求和之前对它们进行标准化的方法。

Is there any way to do this? 有什么办法吗?

Edit: This, of course, would be useful for other metrics as well that are note uniform across servers, eg RAM. 编辑:当然,这对于其他度量也很有用,这些度量在服务器之间是统一的,例如RAM。

Try this: 尝试这个:

summarize(sumSeries(averageSeriesWithWildcards(servers.$foo.$bar.*.collectd.cpu-*.cpu-system.value, 5)), '$timeframe', 'max', false)

I'm not sure it will work, but I believe it follows the steps you outlined and perhaps you can tune it to make it work. 我不确定它是否会起作用,但是我相信它会按照您概述的步骤进行操作,也许您可​​以对其进行调整以使其起作用。 :) See the docs about Graphite functions . :)请参阅有关Graphite函数的文档。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM