由于频繁的垃圾收集导致服务吞吐量低

Question

I have a service running on a system with 16GB RAM with following configurations :我在具有 16GB RAM 的系统上运行了一项服务，配置如下：

-Xms6144M"
-Xmx6144M
-XX:+UseG1GC
-XX:NewSize=1500M
-XX:NewSize=1800M
-XX:MaxNewSize=2100M
-XX:NewRatio=2
-XX:SurvivorRatio=12
-XX:MaxGCPauseMillis=100
-XX:MaxGCPauseMillis=1000
-XX:GCTimeRatio=9
-XX:-UseAdaptiveSizePolicy
-XX:+PrintAdaptiveSizePolicy

It has around 20 pollers running each having ThreadPoolExecutor of size 30 for processing messages.它有大约 20 个轮询器在运行，每个轮询器都有大小为 30 的 ThreadPoolExecutor 来处理消息。 Initially for around 5-6 hours it was able to process around 130 messages per second.最初大约 5-6 个小时，它每秒能够处理大约 130 条消息。 Thereafter it was able to process around only 40 messages per second.此后，它每秒只能处理大约 40 条消息。

I analyzed GC logs to find out that Full GC became very frequent and more than 1000MB data was getting promoted from Young to Old Generation:我分析了 GC 日志，发现 Full GC 变得非常频繁，超过 1000MB 的数据从年轻代提升到老年代：

Looking at the Heap Dump I see lots of thread in Waiting state similar to this : WAITING at sun.misc.Unsafe.park(Native Method) And following classes objects acquiring most retained size :查看堆转储，我看到很多线程处于等待状态，类似于： WAITING at sun.misc.Unsafe.park(Native Method)以及以下类对象获取最大保留大小：

I think there may be small Memory leak in service and its associated libraries which is getting accumulated over time so increasing Heap size will only postpone this.我认为服务及其相关库中可能存在小的内存泄漏，这些泄漏会随着时间的推移而累积，因此增加堆大小只会推迟这一点。 Or may be as the Full GC have become very frequent all other threads are getting stopped very frequently ("stop the world" pauses).或者可能是因为 Full GC 变得非常频繁，所有其他线程都非常频繁地停止（“停止世界”暂停）。 Need help to figure out the root cause of this behaviour.需要帮助找出这种行为的根本原因。

Answer 1

GC pattern looks like memory leak. GC 模式看起来像内存泄漏。

Looking at your heap dump stats I can see 3M tasks waiting for execution in thread pools.查看您的堆转储统计信息，我可以看到 3M 任务在线程池中等待执行。

I can speculate, you are using thread pools with unbounded task queue.我可以推测，您正在使用具有无限任务队列的线程池。 Your inbound rate of message is greater than processing capacity of system, so backlog is growing up consuming more memory eventually leading to death by GC.您的消息入站率大于系统的处理能力，因此积压正在增长，消耗更多内存最终导致 GC 死亡。

Dependent on your case, you may either limit queue size for thread pool or try to optimize memory footprint of queue tasks.根据您的情况，您可以限制线程池的队列大小或尝试优化队列任务的内存占用。

Limiting queue size would create a back pressure on previous processing stage.限制队列大小会对之前的处理阶段产生背压。 If it is simple timer driven poller who is producer for thread pool, effect would be reducer polling interval (as poller would block waiting for room in queue).如果是简单的定时器驱动轮询器作为线程池的生产者，则效果将是减少器轮询间隔（因为轮询器会阻塞等待队列中的空间）。

Optimization of task memory footprint would work only is your processing capabilities in average greater than inbound task rate and problem is cause by temporary surge.任务内存占用的优化仅在您的平均处理能力大于入站任务率并且问题是由临时激增引起的情况下才起作用。

由于频繁的垃圾收集导致服务吞吐量低

问题描述

1 个解决方案

解决方案1
1 2019-05-01 12:58:17

由于频繁的垃圾收集导致服务吞吐量低

问题描述

1 个解决方案

解决方案1 1 2019-05-01 12:58:17

解决方案1
1 2019-05-01 12:58:17