繁体   English   中英

AWS Cloudwatch 与 JVM 日志中的读取不一致以及 memory 使用率高

[英]Inconsistent reads in AWS Cloudwatch vs JVM logs and high memory usage

我在监控我的 aws ECS(Fargate)集群中的 memory 使用情况时遇到问题。 我在 cloudwatch 仪表板中看到 memory 似乎没有正确收集垃圾,因此自动缩放策略继续添加新实例。

因此,每次我在 web 应用程序(部署在 aws ECS 中的那个)中按下按钮时,我都会添加这些日志:

 public static void memoryCheck() {
    final long maxHeapSize = Runtime.getRuntime().maxMemory();
    final long currentHeapSize = Runtime.getRuntime().totalMemory();
    final long freeMemory = Runtime.getRuntime().freeMemory();
    final long usedMemory = currentHeapSize - freeMemory;
    final float usagePercentage = 100f * ((float)usedMemory / (float)maxHeapSize);

    log.info("max heap size: {}", maxHeapSize);
    log.info("current heap size: {}", currentHeapSize);
    log.info("free memory: {}", freeMemory);
    log.info("used memory: {}", usedMemory);
    log.info("usage percentage: {}%", usagePercentage);
}

我进行了一些计算,同时我在 cloudwatch 中使用了这个查询:

fields MemoryUtilized, TaskId
| stat avg(100* (MemoryUtilized/MemoryReserved)) as usedMemoryPercentage by bin(1m)
| limit 200

ecs cloudwatch 图表 我将这些结果与我在应用程序日志中看到的结果进行了比较:

[2022-07-31 15:05:45,409] usage percentage: 4.680836%   
[2022-07-31 15:07:15,255] usage percentage: 28.25417%   
[2022-07-31 15:10:35,612] usage percentage: 9.401683%   
[2022-07-31 15:10:37,186] usage percentage: 12.7543%
[2022-07-31 15:11:45,724] usage percentage: 13.398229%   
[2022-07-31 15:11:46,983] usage percentage: 17.017136%   
[2022-07-31 15:12:02,926] usage percentage: 17.581753%   
[2022-07-31 15:12:04,195] usage percentage: 20.910671%   
[2022-07-31 15:12:05,142] usage percentage: 21.470863%   
[2022-07-31 15:12:06,411] usage percentage: 25.008553%   
[2022-07-31 15:12:07,426] usage percentage: 25.564808%   
[2022-07-31 15:12:08,733] usage percentage: 28.876656%   
[2022-07-31 15:12:09,682] usage percentage: 29.434296%   
[2022-07-31 15:12:11,009] usage percentage: 32.670258%   
[2022-07-31 15:12:11,884] usage percentage: 33.084824%   
[2022-07-31 15:12:13,361] usage percentage: 5.8098726%   
[2022-07-31 15:12:14,170] usage percentage: 6.2343636%   
[2022-07-31 15:12:15,369] usage percentage: 9.761842%   
[2022-07-31 15:12:16,261] usage percentage: 10.529714%   
[2022-07-31 15:12:17,516] usage percentage: 13.84009%   
[2022-07-31 15:12:18,319] usage percentage: 14.395365%   
[2022-07-31 15:12:19,648] usage percentage: 17.555946%   
[2022-07-31 15:12:38,523] usage percentage: 18.108969%   
[2022-07-31 15:12:39,829] usage percentage: 21.39381%   
[2022-07-31 15:15:00,594] usage percentage: 22.144833%   
[2022-07-31 15:15:02,086] usage percentage: 25.362867%   
[2022-07-31 15:20:40,696] usage percentage: 26.10088%   
[2022-07-31 15:20:41,987] usage percentage: 29.37069%   
[2022-07-31 15:20:53,282] usage percentage: 29.936934%   
[2022-07-31 15:20:54,626] usage percentage: 5.3704033%   
[2022-07-31 15:20:55,472] usage percentage: 5.906469%   
[2022-07-31 15:20:56,626] usage percentage: 9.137836%   
[2022-07-31 15:20:57,673] usage percentage: 9.693291%   
[2022-07-31 15:20:58,844] usage percentage: 13.019199%   
[2022-07-31 15:20:59,950] usage percentage: 13.584228%   
[2022-07-31 15:21:01,196] usage percentage: 16.747267%   
[2022-07-31 15:21:02,111] usage percentage: 17.29407%   
[2022-07-31 15:21:03,343] usage percentage: 20.61859%   
[2022-07-31 15:21:04,242] usage percentage: 21.172577%   
[2022-07-31 15:21:05,502] usage percentage: 24.336498%   
[2022-07-31 15:27:46,785] usage percentage: 25.272879%   
[2022-07-31 15:27:48,120] usage percentage: 28.3611%
[2022-07-31 15:27:50,839] usage percentage: 28.914537%   
[2022-07-31 15:27:52,085] usage percentage: 32.04773%   
[2022-07-31 15:27:54,059] usage percentage: 4.9811816%   
[2022-07-31 15:27:55,317] usage percentage: 8.339294%

正如您在应用程序日志中所见,情况似乎还不错,但在 cloudwatch 中,memory 似乎没有正确收集垃圾,并且通常存在不匹配(例如:查看 15:12:13)

你认为可能是什么问题? 我错过了什么吗?

PS“实验”已在 ECS 中完成,1 个服务和 1 个任务禁用自动缩放 并且 java 进程使用以下参数初始化:“-XX:+UseParallelOldGC”、“-Xms11G”、“-Xmx11G”

首先,您在比较苹果和橙子 - 使用 memory 与 maxheap 定量并不能真正告诉您其他比率,即使 java 进程的 ZCD69B4957F06CD818D7BF3D61980E2 消耗主要是由于 9。

See apangin's comment - his answer in particular is great: Java using much more memory than heap size (or size correctly Docker memory limit)

其次,您应该为堆外 memory 添加一些报告。 例如 NMT 跟踪https://docs.oracle.com/en/java/javase/18/vm/native-memory-tracking.html#GUID-39676837-DA61-4F814D-9C50B-9DB1

第三,你说:

我在 cloudwatch 仪表板中看到 memory 似乎没有正确收集垃圾

但是是什么让你这么认为呢? Memory 使用率随着服务利用率自然增长,它可能并不总是 memory 泄漏。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM