[英]Inconsistent reads in AWS Cloudwatch vs JVM logs and high memory usage
我在监控我的 aws ECS(Fargate)集群中的 memory 使用情况时遇到问题。 我在 cloudwatch 仪表板中看到 memory 似乎没有正确收集垃圾,因此自动缩放策略继续添加新实例。
因此,每次我在 web 应用程序(部署在 aws ECS 中的那个)中按下按钮时,我都会添加这些日志:
public static void memoryCheck() {
final long maxHeapSize = Runtime.getRuntime().maxMemory();
final long currentHeapSize = Runtime.getRuntime().totalMemory();
final long freeMemory = Runtime.getRuntime().freeMemory();
final long usedMemory = currentHeapSize - freeMemory;
final float usagePercentage = 100f * ((float)usedMemory / (float)maxHeapSize);
log.info("max heap size: {}", maxHeapSize);
log.info("current heap size: {}", currentHeapSize);
log.info("free memory: {}", freeMemory);
log.info("used memory: {}", usedMemory);
log.info("usage percentage: {}%", usagePercentage);
}
我进行了一些计算,同时我在 cloudwatch 中使用了这个查询:
fields MemoryUtilized, TaskId
| stat avg(100* (MemoryUtilized/MemoryReserved)) as usedMemoryPercentage by bin(1m)
| limit 200
[2022-07-31 15:05:45,409] usage percentage: 4.680836%
[2022-07-31 15:07:15,255] usage percentage: 28.25417%
[2022-07-31 15:10:35,612] usage percentage: 9.401683%
[2022-07-31 15:10:37,186] usage percentage: 12.7543%
[2022-07-31 15:11:45,724] usage percentage: 13.398229%
[2022-07-31 15:11:46,983] usage percentage: 17.017136%
[2022-07-31 15:12:02,926] usage percentage: 17.581753%
[2022-07-31 15:12:04,195] usage percentage: 20.910671%
[2022-07-31 15:12:05,142] usage percentage: 21.470863%
[2022-07-31 15:12:06,411] usage percentage: 25.008553%
[2022-07-31 15:12:07,426] usage percentage: 25.564808%
[2022-07-31 15:12:08,733] usage percentage: 28.876656%
[2022-07-31 15:12:09,682] usage percentage: 29.434296%
[2022-07-31 15:12:11,009] usage percentage: 32.670258%
[2022-07-31 15:12:11,884] usage percentage: 33.084824%
[2022-07-31 15:12:13,361] usage percentage: 5.8098726%
[2022-07-31 15:12:14,170] usage percentage: 6.2343636%
[2022-07-31 15:12:15,369] usage percentage: 9.761842%
[2022-07-31 15:12:16,261] usage percentage: 10.529714%
[2022-07-31 15:12:17,516] usage percentage: 13.84009%
[2022-07-31 15:12:18,319] usage percentage: 14.395365%
[2022-07-31 15:12:19,648] usage percentage: 17.555946%
[2022-07-31 15:12:38,523] usage percentage: 18.108969%
[2022-07-31 15:12:39,829] usage percentage: 21.39381%
[2022-07-31 15:15:00,594] usage percentage: 22.144833%
[2022-07-31 15:15:02,086] usage percentage: 25.362867%
[2022-07-31 15:20:40,696] usage percentage: 26.10088%
[2022-07-31 15:20:41,987] usage percentage: 29.37069%
[2022-07-31 15:20:53,282] usage percentage: 29.936934%
[2022-07-31 15:20:54,626] usage percentage: 5.3704033%
[2022-07-31 15:20:55,472] usage percentage: 5.906469%
[2022-07-31 15:20:56,626] usage percentage: 9.137836%
[2022-07-31 15:20:57,673] usage percentage: 9.693291%
[2022-07-31 15:20:58,844] usage percentage: 13.019199%
[2022-07-31 15:20:59,950] usage percentage: 13.584228%
[2022-07-31 15:21:01,196] usage percentage: 16.747267%
[2022-07-31 15:21:02,111] usage percentage: 17.29407%
[2022-07-31 15:21:03,343] usage percentage: 20.61859%
[2022-07-31 15:21:04,242] usage percentage: 21.172577%
[2022-07-31 15:21:05,502] usage percentage: 24.336498%
[2022-07-31 15:27:46,785] usage percentage: 25.272879%
[2022-07-31 15:27:48,120] usage percentage: 28.3611%
[2022-07-31 15:27:50,839] usage percentage: 28.914537%
[2022-07-31 15:27:52,085] usage percentage: 32.04773%
[2022-07-31 15:27:54,059] usage percentage: 4.9811816%
[2022-07-31 15:27:55,317] usage percentage: 8.339294%
正如您在应用程序日志中所见,情况似乎还不错,但在 cloudwatch 中,memory 似乎没有正确收集垃圾,并且通常存在不匹配(例如:查看 15:12:13)
你认为可能是什么问题? 我错过了什么吗?
PS“实验”已在 ECS 中完成,1 个服务和 1 个任务禁用自动缩放 并且 java 进程使用以下参数初始化:“-XX:+UseParallelOldGC”、“-Xms11G”、“-Xmx11G”
首先,您在比较苹果和橙子 - 使用 memory 与 maxheap 定量并不能真正告诉您其他比率,即使 java 进程的 ZCD69B4957F06CD818D7BF3D61980E2 消耗主要是由于 9。
See apangin's comment - his answer in particular is great: Java using much more memory than heap size (or size correctly Docker memory limit)
其次,您应该为堆外 memory 添加一些报告。 例如 NMT 跟踪https://docs.oracle.com/en/java/javase/18/vm/native-memory-tracking.html#GUID-39676837-DA61-4F814D-9C50B-9DB1
第三,你说:
我在 cloudwatch 仪表板中看到 memory 似乎没有正确收集垃圾
但是是什么让你这么认为呢? Memory 使用率随着服务利用率自然增长,它可能并不总是 memory 泄漏。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.