简体   繁体   English

G1GC非常高的GC数和CPU,非常频率的GC会破坏性能

[英]G1GC very high GC count and CPU, very frequency GCs that kill performance

I've recently switched my Java application from CMS + ParNew to G1GC. 我最近将我的Java应用程序从CMS + ParNew切换到了G1GC。 What I observed when I did the switch is the CPU usage went higher and the GC count + pause time went up as well. 我在切换时观察到的是CPU使用率更高,GC计数+暂停时间也增加了。 My JVM flags before the switched were 切换前我的JVM标志是

 java -Xmx22467m -Xms22467m -XX:NewSize=11233m -XX:+UseConcMarkSweepGC -XX:AutoBoxCacheMax=1048576 -jar my-application.jar

After the switch my flags are: 切换后我的标志是:

java -Xmx22467m -Xms22467m -XX:+G1GC -XX:AutoBoxCacheMax=1048576 -XX:MaxGCPauseMillis=30 -jar my-application.jar

I followed Oracle's Best Practivies http://www.oracle.com/technetwork/tutorials/tutorials-1876574.html 我遵循了Oracle的最佳实践http://www.oracle.com/technetwork/tutorials/tutorials-1876574.html

Do not Set Young Generation Size

And did not set the young generation size. 并没有设定年轻一代的规模。 However I am suspecting that the young generation size is the problem here. 但是我怀疑年轻一代是这里的问题。 What I see is the heap usage is fluctuating between ~6 - 8 GB. 我看到的是堆使用量在~6 - 8 GB之间波动。 堆使用情况

Whereas before, with CMS and Par New there the memory usage grew between 4-16 GB and only then I saw a GC: 以前,使用CMS和Par New,内存使用量增长到4-16 GB之间,然后才看到GC: 在此输入图像描述

I am not sure I understand why with G1GC the the GC is so frequent. 我不确定我理解为什么使用G1GC GC会如此频繁。 I am not sure what I'm missing when it comes to GC tuning with G1GC. 我不确定在使用G1GC进​​行GC调整时我缺少什么。

I'm using Java 8 : ava version "1.8.0_144" Java(TM) SE Runtime Environment (build 1.8.0_144-b01) Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode) 我正在使用Java 8:ava版本“1.8.0_144”Java(TM)SE运行时环境(版本1.8.0_144-b01)Java HotSpot(TM)64位服务器VM(版本25.144-b01,混合模式)

I appreciate your help. 我感谢您的帮助。

UPDATE : A bit more information about those pauses: 更新 :有关这些暂停的更多信息: 在此输入图像描述 As you can see all those pauses are G1New, and seemingly they are as long as my target pause time, which is 30ms. 你可以看到所有这些停顿都是G1New,看起来它们和我的目标暂停时间一样长,即30ms。 When I look at the ParNew pauses before the switch to G1GC, this is how it looked like: 当我在切换到G1GC之前查看ParNew暂停时,这就是它的样子: 在此输入图像描述 So they are also all young gen collections (ParNew) but they are less frequent and shorter, because they happen only when the heap usage gets to around 14GB (according to the graph) 因此它们也都是年轻的gen集合(ParNew),但它们不那么频繁和更短,因为它们只在堆使用量达到14GB左右时发生(根据图表)

I am still clueless why the G1New happen so early (in terms of heap usage) 我仍然无能为力为什么G1New如此早发生(就堆使用而言)

Update 2 I also noticed that NewRatio=2, I don't know if G1GC is respecting that, but that would mean that my New Gen is capped at 7GB. 更新2我还注意到NewRatio = 2,我不知道G1GC是否尊重这一点,但这意味着我的新Gen上限为7GB。 Could that be the reason? 这可能是原因吗?

Update 3 Adding G1GC GC logs: https://drive.google.com/file/d/1iWqZCbB-nU6k_0-AQdvb6vaBSYbkQcqn/view?usp=sharing 更新3添加G1GC GC日志: https ://drive.google.com/file/d/1iWqZCbB-nU6k_0-AQdvb6vaBSYbkQcqn/view?usp = sharing

Your GC log shows an average GC pause interval of 2 seconds with each around 30-40ms, which amounts to an application throughput of around 95%. GC日志显示平均GC暂停间隔为2秒,每个间隔大约30-40ms,相当于应用程序吞吐量约95%。 That does not amount to "killing performance" territory. 这并不等于“杀死性能”领域。 At least not due to GC pauses. 至少不是由于GC暂停。

G1 does more concurrent work though, eg for remembered set refinement and your pauses seem to spend some time in update/scan RS , so I assume the concurrent GC threads are busy too, ie it may need additional CPU cycles outside GC pauses, which is not covered by the logs by default, you need +G1SummarizeRSetStats for that. 但是,G1会执行更多的并发工作,例如,为了记住集合细化,您的暂停似乎在更新/扫描RS中花费一些时间,因此我假设并发GC线程也很忙,即它可能需要在GC暂停之外的额外CPU周期,这是默认情况下,日志未涵盖,您需要+G1SummarizeRSetStats If latency is more important you might want to allocated more cores to the machine, if throughput is more important you could tune G1 to perform more of the RS updates during the pauses (at the cost of increased pause times). 如果延迟更重要,您可能希望为机器分配更多内核,如果吞吐量更重要,您可以调整G1以在暂停期间执行更多RS更新(以增加暂停时间为代价)。

I was able to see that the time spent in copying objects is very significant. 我能够看到复制对象所花费的时间非常重要。 Looks like G1GC has 15 generations by default before the object is promoted to Tenured Generation. 在对象被提升为Tenured Generation之前,看起来G1GC默认有15代。 I reduced it to 1 ( -XX:MaxTenuringThreshold=1 ) 我把它减少到1( -XX:MaxTenuringThreshold=1

Also I don't know how to confirm it in the logs, however visualizing the GC log I saw that the young generation is constantly being resized, from minimum size to maximum size. 此外,我不知道如何在日志中确认它,但是可视化GC日志我看到年轻一代正在不断调整大小,从最小尺寸到最大尺寸。 I narrowed down the range and that also improved the performance. 我缩小了范围,也提高了性能。

Looking here https://docs.oracle.com/javase/9/gctuning/garbage-first-garbage-collector-tuning.htm#JSGCT-GUID-70E3F150-B68E-4787-BBF1-F91315AC9AB9 I was trying to figure out if coarsenings is indeed an issue. 看这里https://docs.oracle.com/javase/9​​/gctuning/garbage-first-garbage-collector-tuning.htm#JSGCT-GUID-70E3F150-B68E-4787-BBF1-F91315AC9AB9我试图找出是否粗糙确实是个问题。 But it simply says to set gc+remset=trace which I do not understand how to pass to java in command line, and if it's even available in JDK 8. I increased the XX:G1RSetRegionEntries a bit just in case. 但它只是说设置gc + remset = trace我不明白如何在命令行中传递给java,如果它甚至可以在JDK 8中使用,我增加了XX:G1RSetRegionEntries以防万一。

I hope it helps to the future G1GC tuner and if anyone else has more suggestions that would be great. 我希望它对未来的G1GC调谐器有所帮助,如果其他人有更多的建议会很棒。

What I still see is that [Processed Buffers] is still taking a very long time in young evacuations, and [Scan RS] is very long in mixed collections. 我仍然看到[Processed Buffers]在年轻的疏散中仍然需要很长时间,并且[Scan RS]在混合收藏中很长。 Not sure why 不知道为什么

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM