简体繁体 English

在 CloudWatch 中跨 ECS 任务实例聚合 codahale 指标计数

[英]aggregating codahale metrics counts across ECS task instances in CloudWatch

原文 2021-07-05 13:43:10 5 1 monitoring/ amazon-ecs/ amazon-cloudwatch/ metrics/ codahale-metrics

I've got an ECS service reporting metrics to CloudWatch collected with Codahale Metrics.我有一个 ECS 服务向 CloudWatch 报告指标，这些指标是使用 Codahale Metrics 收集的。 Some of the metrics are counts, eg count of requests made to an external service.一些度量是计数，例如对外部服务发出的请求的计数。 Each service instance maintains and reports to CloudWatch its own count.每个服务实例维护并向 CloudWatch 报告其自己的计数。 To my understanding it means the values of the count in CloudWatch are the individuals counts per service without a possibility to see eg the total.据我了解，这意味着 CloudWatch 中的计数值是每个服务的个人计数，而无法查看总数。 If each instance was making 300 requests than the value reported would be 300, with not way to sum it up to 900.如果每个实例发出 300 个请求，则报告的值将是 300，无法将其总和为 900。

What is the best way to fix it?修复它的最佳方法是什么？ Is adding an additional dimension with eg ecs task id to the reported CloudWatch metric the way?是否向报告的 CloudWatch 指标添加了一个额外的维度，例如 ecs 任务 ID？

I'm graphing the results in Grafana, but likely it's not the important part.我正在 Grafana 中绘制结果，但这可能不是重要的部分。

1 个解决方案

Metrics are already aggregated in Cloudwatch assuming they have the same namespace and name.指标已在Cloudwatch聚合，假设它们具有相同的命名空间和名称。 If these service request metrics are the same, they should be the same metric, then you can add Dimensions to them, such as TaskId , RequestedService or whatever you wanted to aggregate by.如果这些服务请求指标相同，则它们应该是相同的指标，然后您可以向它们添加 Dimensions，例如TaskId 、 RequestedService或您想要聚合的任何内容。

Typically you have the opposite challenge in Cloudwatch Metrics to what you are describing.通常，您在Cloudwatch Metrics遇到的挑战与您所描述的相反。 Metrics are already aggregated together and then you want to drill down to a particular values to debug some issue, such as if you had a problem with a particular container task you would set the dimension TaskId=todo1 , or if you suspected a service is down you'd set RequestedService=todo2 .指标已经聚合在一起，然后您想要深入到特定值以调试某些问题，例如如果您遇到特定容器任务的问题，您将设置维度TaskId=todo1 ，或者如果您怀疑某个服务已关闭你会设置RequestedService=todo2 。

I suspect you are creating a metric for each service you make requests to, instead you only want one metric, and add dimensions to it as described earlier.我怀疑您正在为您提出请求的每项服务创建一个指标，而不是您只需要一个指标，并如前所述为其添加维度。

Also for this particular use-case you might want to consider open-telemetry/X-Ray which will create for you a service graph and handles the specific case of tracing requests through different services.同样对于这个特定用例，您可能需要考虑开放遥测/X-Ray，它将为您创建一个服务图并处理通过不同服务跟踪请求的特定情况。 That does take a bit of effort to setup though.不过，这确实需要一些努力来设置。