简体繁体 English

多个应用中的内存泄漏

[英]Memory Leak in multiple apps

原文 2012-04-25 21:49:21 3 1 java/ mysql/ hibernate/ solr/ c3p0

I have a memory leak in two apps in Tomcat 6.0.35 server that appeared "out of nowhere". 我在Tomcat 6.0.35服务器中的两个应用程序中出现内存泄漏，这些应用程序“无处不在”。 One app is Solr and the other is our own software. 一个应用程序是Solr，另一个是我们自己的软件。 I'm hoping someone has seen this before as it's been happening to me for the last few weeks and I have to keep restarting Tomcat in a production environment. 我希望有人之前看过这个，因为过去几周我一直在发生这种情况，我必须在生产环境中继续重启Tomcat。

It appeared on our original server despite the fact that none of the code related to thread or DB connection operation has been touched. 它出现在我们的原始服务器上，尽管事实上没有触及与线程或数据库连接操作相关的代码。 As the old server this app runs on was due to be retired I migrated the site to a new server and a "cleaner" environment with the idea that would clear out any legacy stuff. 由于这个应用程序运行的旧服务器已经退役，我将网站迁移到新服务器和“更清洁”的环境，其想法可以清除任何遗留的东西。 But it continues to happen. 但它仍在继续发生。

Just before Tomcat shuts down the catalina.out log is filled with errors like: 就在Tomcat关闭catalina.out之前，日志中充满了以下错误：

2012-04-25 21:46:00,300 [main] ERROR org.apache.catalina.loader.WebappClassLoader- The web application [/AppName] appears to have started a thread named [MultiThreadedHttpConnectionManager cleanup] but has failed to stop it. 2012-04-25 21：46：00,300 [main] ERROR org.apache.catalina.loader.WebappClassLoader- Web应用程序[/ AppName]似乎已经启动了一个名为[MultiThreadedHttpConnectionManager cleanup]的线程，但未能将其停止。 This is very likely to create a memory leak. 这很可能造成内存泄漏。

2012-04-25 21:46:00,339 [main] ERROR org.apache.catalina.loader.WebappClassLoader- The web application [/AppName] appears to have started a thread named [com.mchan ge.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#2] but has failed to stop it. 2012-04-25 21：46：00,339 [main] ERROR org.apache.catalina.loader.WebappClassLoader- Web应用程序[/ AppName]似乎已经启动了一个名为[com.mchan ge.v2.async.ThreadPoolAsynchronousRunner $的线程PoolThread-＃2]但未能阻止它。 This is very likely to create a memory leak. 这很可能造成内存泄漏。

2012-04-25 21:46:00,470 [main] ERROR org.apache.catalina.loader.WebappClassLoader- The web application [/AppName] is still processing a request that has yet to fin ish. 2012-04-25 21：46：00,470 [main] ERROR org.apache.catalina.loader.WebappClassLoader- Web应用程序[/ AppName]仍在处理尚未完成的请求。 This is very likely to create a memory leak. 这很可能造成内存泄漏。 You can control the time allowed for requests to finish by using the unloadDelay attribute of the standard Conte xt implementation. 您可以使用标准Conte xt实现的unloadDelay属性来控制请求完成所允许的时间。

During that migration we went from Solr 1.4->Solr 3.6 in an attempt to fix the problem. 在迁移期间，我们从Solr 1.4-> Solr 3.6开始尝试解决问题。 When the errors above start filling the log the Solr error below follows right behind repeated 10-15 times and then tomcat stops working and I have to shutdown and startup to get it to respond. 当上面的错误开始填充日志时，下面的Solr错误紧随其后10-15次，然后tomcat停止工作，我必须关闭并启动才能让它响应。

2012-04-25 21:46:00,527 [main] ERROR org.apache.catalina.loader.WebappClassLoader- The web application [/solr] created a ThreadLocal with key of type [org.a pache.solr.schema.DateField.ThreadLocalDateFormat] (value [org.apache.solr.schema.DateField$ThreadLocalDateFormat@1f1e90ac]) and a value of type [org.apache.solr. 2012-04-25 21：46：00,527 [main] ERROR org.apache.catalina.loader.WebappClassLoader- Web应用程序[/ solr]创建了一个类型为[org.a pache.solr.schema.DateField]的密钥的ThreadLocal。 ThreadLocalDateFormat]（value [org.apache.solr.schema.DateField$ThreadLocalDateFormat@1f1e90ac]）和类型为[org.apache.solr的值]。 schema.DateField.ISO8601CanonicalDateFormat] (value [org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a]) but failed to remove it when the web a pplication was stopped. schema.DateField.ISO8601CanonicalDateFormat]（value [org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a]）但是当Web应用程序停止时无法将其删除。 This is very likely to create a memory leak. 这很可能造成内存泄漏。

My research has brought up a lot of suggestions about changing the code that manages threads to make sure they kill off DB pooled connections etc. but the this code has not been changed in nearly 12 months. 我的研究提出了很多关于更改管理线程的代码的建议，以确保它们能够扼杀数据库池连接等，但这段代码在近12个月内没有改变。 Also the Solr application is crashing and that's 3rd party so my thinking is that this is environmental (jar conflict, versioning, config fat fingered?) Solr应用程序崩溃了，那是第三方所以我的想法是这是环境的（jar冲突，版本控制，配置胖指法？）

My last change was updating the mysql connector for java to the latest as some memory leak bugs existed around pooling in earlier releases but the server's just crashed again only a few hours later. 我的最后一个更改是将java的mysql连接器更新到最新版本，因为早期版本中的池中存在一些内存泄漏错误，但服务器仅在几个小时后再次崩溃。

One thing I just noticed is I'm seeing thousands of sessions in the Tomcat web manager but that could be a red herring. 我刚注意到的一件事是我在Tomcat Web管理器中看到了数千个会话，但这可能是一个红色的鲱鱼。

If anyone has seen this any help is very much appreciated. 如果有人看到这个任何帮助非常感谢。

[Edit] [编辑]

I think I found the source of the problem. 我想我找到了问题的根源。 It wasn't a memory leak after all. 毕竟这不是内存泄漏。 I've taken over an application from another development team that uses c3p0 for database pooling via Hibernate. 我接管了另一个开发团队的应用程序，该团队使用c3p0通过Hibernate进行数据库池化。 c3p0 has a bug/feature that if you don't release DB connections c3p0 can go into a waiting state once all the connections (via MaxPoolSize: default is 15) are used. c3p0有一个错误/特性，如果你没有释放数据库连接，一旦使用所有连接（通过MaxPoolSize：默认值为15），c3p0就可以进入等待状态。 It will wait indefinitely for a connection to become available. 它将无限期地等待连接变为可用。 Hence my stall. 因此我的摊位。

I upped the MaxPoolSize firstly from 25->100 and my application ran for several days without a hang and then from 100->1000 and it's been running steady ever since (over 2 weeks). 我首先从25-> 100增加了MaxPoolSize，我的应用程序运行了几天没有挂起，然后从100-> 1000，从那以后（超过2周）它一直运行稳定。

This isn't the complete solution as I need to find out why it's running out of pooled connections so I also set c3p0's unreturnedConnectionTimeout to 4hrs which enforces a 4hr time limit on all connections regardless of whether they're active or not. 这不是完整的解决方案，因为我需要找出它为什么用完池连接所以我还将c3p0的unreturnedConnectionTimeout设置为4hrs，这对所有连接强制执行4小时的时间限制，无论它们是否处于活动状态。 If it's an active connection it will close it and re-open again. 如果它是活动连接，它将关闭它并再次重新打开。

Not pretty and c3p0 don't recommend it but it gives me some breathing space to find out the source of the problem. 不漂亮，c3p0不推荐它，但它给了我一些喘息的空间来找出问题的根源。

Note: when using c3p0 with Hibernate the settings are stored in your persistence.xml file but not all settings can be put there. 注意：将c3p0与Hibernate一起使用时，设置存储在persistence.xml文件中，但并非所有设置都可以放在那里。 Some settings (eg unreturnedConnectionTimeout) must go in c3p0.properties 某些设置（例如unreturnedConnectionTimeout ）必须放在c3p0.properties中

1 个解决方案

You state that the sequence of events is: 您声明事件的顺序是：

errors appear 出现错误
Tomcat stops responding Tomcat停止响应
restart is required 需要重启

However, the memory leak error messages only get reported when the web application is stopped. 但是，仅在Web应用程序停止时才会报告内存泄漏错误消息。 Therefore, something is triggering the web applications to stop (or reload). 因此，某些事情会触发Web应用程序停止（或重新加载）。 You need to figure out what is triggering this and stop it. 你需要弄清楚是什么触发它并停止它。

Regarding the actual leaks, you may find this useful: 关于实际泄漏，您可能会发现这很有用：

http://people.apache.org/~markt/presentations/2010-11-04-Memory-Leaks-60mins.pdf http://people.apache.org/~markt/presentations/2010-11-04-Memory-Leaks-60mins.pdf

It looks both your app and Solr have some leaks that need to be fixed. 它看起来你的应用程序和Solr有一些需要修复的泄漏。 The presentation will provide you with some pointers. 演示文稿将为您提供一些指导。 I would also consider an upgrade to the latest 7.0.x. 我还会考虑升级到最新的7.0.x. The memory leak detection has been improved and not all improvements have made it into 6.0.x yet. 内存泄漏检测已得到改进，并非所有改进都已进入6.0.x。