简体   繁体   English

Java ConcurrentMarkSweep垃圾收集器不会删除所有垃圾

[英]Java ConcurrentMarkSweep garbage collector not removing all garbage

Short form: The CMS garbage collector appears to be failing to collect an ever-increasing amount of garbage; 简短形式:CMS垃圾收集器似乎未能收集到越来越多的垃圾; eventually, our JVM fills up, and the application becomes unresponsive. 最终,我们的JVM填满了,应用程序变得没有响应。 Forcing a GC via an external tool (JConsole or jmap -histo:live ) cleans it up once. 通过外部工具(JConsole或jmap -histo:live )强制GC清理一次。

UPDATE: The problem appears to be related to the JTop plugin for JConsole; 更新:问题似乎与JConsole的JTop插件有关; if we don't run JConsole, or run it without the JTop plugin, the behavior goes away. 如果我们不运行JConsole,或者在没有JTop插件的情况下运行它,行为就会消失。

(Technical notes: we're running Sun JDK 1.6.0_07, 32-bit, on a Linux 2.6.9 box. Upgrading the JDK version is not really an option, unless there's an unavoidable, major reason. Also, our system is not hooked up to an Internet-accessible machine, so screenshots of JConsole, etc aren't an option.) (技术说明:我们在Linux 2.6.9盒子上运行Sun JDK 1.6.0_07,32位。升级JDK版本并不是一个选择,除非有一个不可避免的主要原因。另外,我们的系统不是连接到可访问Internet的计算机,因此JConsole等的屏幕截图不是一个选项。)

We're currently running our JVM with the following flags: 我们当前正在运行带有以下标志的JVM:

-server -Xms3072m -Xmx3072m -XX:NewSize=512m -XX:MaxNewSize=512m 
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled 
-XX:CMSInitiatingOccupancyFraction=70 
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps 
-XX:+DisableExplicitGC

Observing the memory graph in JConsole, there's a full GC that runs every ~15 minutes or so during the first several hours of our application's lifespan; 在JConsole中观察内存图,有一个完整的GC,在我们的应用程序生命周期的前几个小时内每隔约15分钟运行一次; after each full GC, there's more and more memory still in use. 在每个完整的GC之后,仍然有越来越多的内存在使用中。 After a few hours, the system hits a steady state where there's approximately 2GB of used memory in the CMS old gen. 几个小时后,系统达到稳定状态,CMS旧版中大约有2GB的已用内存。

Which sounds like a classic memory leak, except that if we use any tool that forces a full GC (hitting the "collect garbage" button in JConsole, or running jmap -histo:live , etc), the old gen suddenly drops to ~500MB used, and our application becomes responsive again for the next several hours (during which time the same pattern continues - after each full GC, more and more of the old gen is full.) 这听起来像是经典的内存泄漏,除非我们使用任何强制完整GC的工具(点击JConsole中的“收集垃圾”按钮,或运行jmap -histo:live等),旧版本突然降至~500MB使用,我们的应用程序在接下来的几个小时内再次响应(在此期间相同的模式继续 - 在每个完整的GC之后,越来越多的旧版本已满。)

One thing of note: in JConsole, the reported ConcurrentMarkSweep GC count will stay at 0 until we force a GC with jconsole/jmap/etc. 需要注意的一点是:在JConsole中,报告的ConcurrentMarkSweep GC计数将保持为0,直到我们使用jconsole / jmap / etc强制GC。

Using jmap -histo and jmap -histo:live in sequence, I am able to determine that the apparently uncollected objects consist of: 使用jmap -histojmap -histo:live按顺序生成,我能够确定明显未收集的对象包括:

  • several million HashMap s and arrays of HashMap$Entry (in a 1:1 ratio) 几百万个HashMapHashMap$Entry数组(比例为1:1)
  • several million Vector s and Object arrays (1:1 ratio, and about the same as the number of HashMaps) 数百万个Vector和对象数组(1:1比例,与HashMaps的数量大致相同)
  • several million HashSet , Hashtable , and com.sun.jmx.remote.util.OrderClassLoader s, as well as arrays of Hashtable$Entry (about the same number of each; about half as many as the HashMaps and Vectors) 几百万个HashSetHashtablecom.sun.jmx.remote.util.OrderClassLoader ,以及Hashtable$Entry数组(大约相同的数量;大约是HashMaps和Vectors的一半)

There are some excerpts from the GC output below; 下面的GC输出中有一些摘录; my interpretation of them appears to be that the CMS GC is getting aborted without failing over to the stop-the-world GC. 我对它们的解释似乎是CMS GC中止而没有故障转移到世界各地的GC。 Am I misinterpreting this output somehow? 我是否以某种方式误解了这个输出? Is there something that would cause that? 有什么东西会导致这种情况吗?

During the normal runtime, the CMS GC output blocks look about like this: 在正常运行时期间,CMS GC输出块看起来像这样:

36301.827: [GC [1 CMS-initial-mark: 1856321K(2621330K)] 1879456K(3093312K), 1.7634200 secs] [Times: user=0.17 sys=0.00, real=0.18 secs]
36303.638: [CMS-concurrent-mark-start]
36314.903: [CMS-concurrent-mark: 7.804/11.264 secs] [Times: user=2.13 sys=0.06, real=1.13 secs]
36314.903: [CMS-concurrent-preclean-start]
36314.963: [CMS-concurrent-preclean: 0.037/0.060 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
36314.963: [CMS-concurrent-abortable-preclean-start]
36315.195: [GC 36315.195: [ParNew: 428092K->40832K(471872K), 1.1705760 secs] 2284414K->1897153K(3093312K), 1.1710560 secs] [Times: user=0.13 sys=0.02, real=0.12 secs]
CMS: abort preclean due to time 36320.059: [CMS-concurrent-abortable-preclean: 0.844/5.095 secs] [Times: user=0.74 sys=0.05, real=0.51 secs]
36320.062: [GC[YG occupancy: 146166 K (471872 K)]36320.062: [Rescan (parallel), 1.54078550 secs]36321.603: [weak refs processing, 0.0042640 secs] [1 CMS-remark: 1856321K(2621440K)] 2002488K(3093312K), 1.5456150 secs] [Times: user=0.18 sys=0.03, real=0.15 secs]
36321.608: [CMS-concurrent-sweep-start]
36324.650: [CMS-concurrent-sweep: 2.686/3.042 secs] [Times: uesr=0.66 sys=0.02, real=0.30 secs]
36324.651: [CMS-concurrent-reset-start]
36324.700: [CMS-concurrent-reset: 0.050/0.050 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]

and that's it; 就是这样; the next line will be the next ParNew GC. 下一行将是下一个ParNew GC。

When we force a GC using jmap -histo:live, we instead get: 当我们使用jmap -histo:live强制GC时,我们得到:

48004.088: [CMS-concurrent-mark: 8.012/8.647 secs] [Times: user=1.15 sys=0.02, real=0.87 secs]
(concurrent mode interrupted)

followed by ~125 lines of the form below: (some GeneratedMethodAccessor, some GeneratedSerializationConstructorAccessor, some GeneratedConstructorAccessor, etc) 接下来是以下形式的~125行:(一些GeneratedMethodAccessor,一些GeneratedSerializationConstructorAccessor,一些GeneratedConstructorAccessor等)

[Unloading class sun.reflect.GeneratedMethodAccessor3]

followed by: 其次是:

: 1911295K->562232K(2621440K), 15.6886180 secs] 2366440K->562232K(3093312K), [CMS Perm: 52729K->51864K(65536K)], 15.6892270 secs] [Times: user=1.55 sys=0.01, real=1.57 secs]

Thanks in advance! 提前致谢!

com.sun.jmx.remote.util.OrderClassLoader is used in the remote'ing layer for JMX and a quick review of the code suggests they're created as part of the unmarshalling process for remote requests inside of the JVM. com.sun.jmx.remote.util.OrderClassLoader用于JMX的远程层,快速查看代码表明它们是作为JVM内部远程请求的解组过程的一部分创建的。 The lifetime of those classloaders will be directly related to the lifetime of the thing that was unmarshalled such that once there are no longer any references to that thing the classloader could be released. 这些类加载器的生命周期将与解组的东西的生命周期直接相关,这样一旦不再有任何对该东西的引用,就可以释放类加载器。

I wouldn't be surprised if in this case the presence of these instances was a direct result of you using JConsole to examine the goings on in the JVM. 如果在这种情况下,这些实例的存在是您使用JConsole检查JVM中的结果的直接结果,我不会感到惊讶。 And it looks like they'd just be cleaned up by GC as part of normal operation. 看起来它们只是被GC清理干净,这是正常操作的一部分。

I guess it's possible there's a bug in the JMX implementation (seems unlikely in a relatively up-to-date JVM) or perhaps you have some custom MBeans or are using some custom JMX tools that are causing the problem. 我想有可能JMX实现中存在一个错误(在相对较新的JVM中似乎不太可能)或者您可能有一些自定义MBean或正在使用导致问题的一些自定义JMX工具。 But ultimately, I'm suspecting the OrderClassLoader is probably a red-herring and the issue lies elsewhere (broken GC or some other leak). 但最终,我怀疑OrderClassLoader可能是一个红鲱鱼,问题出在其他地方(GC崩溃或其他一些泄漏)。

Technical notes: we're running Sun JDK 1.6.0_07, 32-bit, on a Linux 2.6.9 box. 技术说明:我们在Linux 2.6.9机器上运行Sun JDK 1.6.0_07,32位。 Upgrading the JDK version is not really an option, unless there's an unavoidable, major reason. 升级JDK版本并不是一个真正的选择,除非有一个不可避免的主要原因。

Several newer Java versions have had updates to the CMS garbage collector. 几个较新的Java版本已经对CMS垃圾收集器进行了更新。 Notably 6u12, 6u14, and 6u18. 特别是6u12,6u14和6u18。

I'm not an expert with GC stuff, but I'm guessing the preclean fixes in 6u14 may fix the issue you're seeing. 我不是GC的专家,但我猜测6u14中的预清洁修复可能会解决你所看到的问题。 Of course, I could say the same thing about 6u18's class unloading bugs. 当然,我可以对6u18的类卸载错误说同样的话。 Like I said, I'm not an expert at GC stuff. 就像我说的,我不是GC的专家。

There are fixes for: 有修复:

  • 6u10: (affects 6u4+) CMS never clears referents when -XX:+ParallelRefProcEnabled 6u10 :(影响6u4 +)当-XX:+ ParallelRefProcEnabled时,CMS永远不会清除所指对象
  • 6u12: CMS: Incorrect encoding of overflown object arrays during concurrent precleaning 6u12:CMS:并发预清理期间溢出对象数组的编码不正确
  • 6u12: CMS: Incorrect overflow handling when using parallel concurrent marking 6u12:CMS:使用并行并发标记时溢出处理不正确
  • 6u14: CMS: assertion failure "is_cms_thread == Thread::current()->is_ConcurrentGC_thread()" 6u14:CMS:断言失败“is_cms_thread == Thread :: current() - > is_ConcurrentGC_thread()”
  • 6u14: CMS: Need CMSInitiatingPermOccupancyFraction for perm, divorcing from CMSInitiatingOccupancyFraction 6u14:CMS:需要CMSInitiatingPermOccupancyFraction for perm,与CMSInitiatingOccupancyFraction离婚
  • 6u14: CMS assert: _concurrent_iteration_safe_limit update missed 6u14:CMS断言:错过_concurrent_iteration_safe_limit更新
  • 6u14: CMS: Incorrect overflow handling during precleaning of Reference lists 6u14:CMS:预清理参考列表期间的溢出处理不正确
  • 6u14: SIGSEGV or (!is_null(v),"oop value can never be zero") assertion when running with CMS and COOPs 6u14:SIGSEGV或(!is_null(v),“oop值永远不能为零”)与CMS和COOP一起运行时的断言
  • 6u14: CMS: Livelock in CompactibleFreeListSpace::block_size(). 6u14:CMS:CompactibleFreeListSpace :: block_size()中的Livelock。
  • 6u14: Make CMS work with compressed oops 6u14:使CMS与压缩的oops一起使用
  • 6u18: CMS: core dump with -XX:+UseCompressedOops 6u18:CMS:带-XX的核心转储:+ UseCompressedOops
  • 6u18: CMS: bugs related to class unloading 6u18:CMS:与类卸载相关的错误
  • 6u18: CMS: ReduceInitialCardMarks unsafe in the presence of cms precleaning 6u18:CMS:在存在cms预清洁的情况下,ReduceInitialCardMarks不安全
  • 6u18: [Regression] -XX:NewRatio with -XX:+UseConcMarkSweepGC causes fatal error 6u18:[回归] -XX:带-XX的NewRatio:+ UseConcMarkSweepGC导致致命错误
  • 6u20: card marks can be deferred too long 6u20:卡片标记可以延迟太长时间

In addition to all of the above, 6u14 also introduced the G1 garbage collector, although it is still in testing. 除了以上所有,6u14还引入了G1垃圾收集器,虽然它仍在测试中。 G1 is intended to replace CMS in Java 7. G1旨在取代Java 7中的CMS。

G1 can be used in Java 6u14 and newer with the following command-line switches: G1可以在Java 6u14及更高版本中使用以下命令行开关:

-XX:+UnlockExperimentalVMOptions -XX:+UseG1GC

I would start with something far simpler, like: 我会从更简单的事情开始,比如:

-server -Xms3072m -Xmx3072m -XX:+UseParallelOldGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps 

And see if this meets your needs. 看看这是否符合您的需求。

It looks like you are building objects which point back to their owners ( A points to B points to A ). 看起来你正在构建指向其所有者的对象(A指向B指向A)。 This results in the reference counts remaining greater than zero, so the garbage collector can't clean them up. 这导致引用计数保持大于零,因此垃圾收集器无法清除它们。 You need to break the cycle when you release them. 释放它们时需要打破循环。 Nullifying the reference in either A or B will solve the problem. 在A或B中取消引用将解决问题。 This works even in larger reference looks like ( A -> B -> C -> D -> A). 即使在较大的参考像(A - > B - > C - > D - > A)中也是如此。 Vectors and object arrays may be used by your HashMaps. 您的HashMaps可以使用向量和对象数组。

The presense of the remote loaders may indicate a failure to cleanup and close references to objects loaded via JNDI or other remote access method. 远程加载器的存在可能表示无法清除和关闭通过JNDI或其他远程访问方法加载的对象的引用。

EDIT: I took a second look at your last line. 编辑:我再看看你的最后一行。 You may want to increase the perm allocation. 您可能希望增加烫发分配。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM