简体   繁体   English

tomcat中的Java Web应用程序会定期冻结

[英]Java web app in tomcat periodically freezes up

My Java web app running Tomcat (7.0.28) periodically becomes unresponsive. 我的运行Tomcat(7.0.28)的Java Web应用程序会定期无响应。 I'm hoping for some suggestions of possible culprits (synchronization?), as well as maybe some recommended tools for gathering more information about what's occurring during a crash. 我希望得到一些可能的罪魁祸首(同步?)的建议,以及一些推荐的工具,用于收集有关崩溃期间发生的事情的更多信息。 Some facts that I have accumulated: 我积累的一些事实:

  • When the web app freezes up, tomcat continues to feed request threads into the app, but the app does not release them. 当Web应用程序冻结时,tomcat继续将请求线程提供给应用程序,但应用程序不会释放它们。 The thread pool fills up to the maximum (currently 250), and then subsequent requests immediately fail. 线程池填满最大值(当前为250),然后后续请求立即失败。 During normal operation, there is never more than 2 or 3 active threads. 在正常操作期间,永远不会有超过2或3个活动线程。

  • There are no errors or exceptions of any kind logged to any of our tomcat or web app logs when the problem occurs. 发生问题时,没有任何错误或异常记录到我们的任何tomcat或Web应用程序日志中。

  • Doing a "Stop" and then a "Start" on our application via the tomcat management web app immediately fixes this problem (until today). 通过tomcat管理Web应用程序在我们的应用程序上执行“停止”然后“启动”可以立即解决此问题(直到今天)。

  • Lately the frequency has been two or three times a day, though today was much worse, probably 20 times, and sometimes not coming back to life immediately. 最近频率已经是一天两到三次,虽然今天更糟糕,可能是20次,有时候不会立即恢复生机。

  • The problem occurs only during business hours 问题仅在工作时间发生

  • The problem does not occur on our staging system 我们的登台系统不会出现此问题

  • When the problem occurs, processor and memory usage on the server remains flat (and fairly low). 出现问题时,服务器上的处理器和内存使用率保持不变(并且相当低)。 Tomcat reports plenty of free memory. Tomcat报告了大量的可用内存。

  • Tomcat continues to be responsive when the problem occurs. 发生问题时,Tomcat会继续响应。 The management web app works perfectly well, and tomcat continues sending requests into our app until all threads in the pool are filled. 管理Web应用程序运行良好,tomcat继续向我们的应用程序发送请求,直到池中的所有线程都被填满。

  • Our database server remains responsive when the problem occurs. 发生问题时,我们的数据库服务器保持响应。 We use Spring framework for data access and injection. 我们使用Spring框架进行数据访问和注入。

  • Problem generally occurs when usage is high, but there is never an unusually high spike in usage. 当使用率很高时通常会出现问题,但使用率从未出现异常高的峰值。

  • Problem history: something similar occurred about a year and a half ago. 问题历史:大约一年半前发生过类似的事情。 After many server config and code changes, the problem disappeared until about a month ago. 在许多服务器配置和代码更改后,问题消失,直到大约一个月前。 Within the past few weeks it has occurred much more frequently, an average of 2 or 3 times a day, sometimes several times in a row. 在过去的几周里,它的发生频率更高,平均每天2到3次,有时连续几次。

  • I identified some server code today that may not have been threadsafe, and I put a fix in for that, but the problem is still happening (though less frequently). 我今天发现了一些可能没有线程安全的服务器代码,我为此修好了,但问题仍在发生(尽管不那么频繁)。 Is this the sort of problem that un-threadsafe code can cause? 这是非线程安全代码可能导致的问题吗?

UPDATE: With several posts suggesting database connection pool exhaustion, I did some searching in that direction and found this other Stackoverflow question which explains almost all of the problems I'm experiencing. 更新:有几个帖子暗示数据库连接池耗尽,我做了一些搜索,发现了另一个Stackoverflow问题 ,它解释了我遇到的几乎所有问题。

Apparently, the default values for maxActive and maxIdle connections in Apache's BasicDataSource implementation are each 8. Also, maxWait is set to -1, so when the pool is exhausted and a new request for a connection comes in, it will wait forever without logging any sort of exception. 显然,Apache的BasicDataSource实现中maxActive和maxIdle连接的默认值均为8.此外,maxWait设置为-1,因此当池耗尽并且有新的连接请求进入时,它将永远等待而不记录任何有点例外。 I'm still going to wait for this problem to happen again and perform a jstack dump on the JVM so that I can analyze that information, but it's looking like this is the problem. 我仍然要等待这个问题再次发生并在JVM上执行jstack转储,以便我可以分析该信息,但看起来这就是问题所在。 The only thing it doesn't explain is why the app sometimes doesn't recover from this problem. 它唯一没有解释的是为什么应用程序有时无法从这个问题中恢复。 I suppose the requests just pile up sometimes and once it gets behind it can never catch up. 我认为这些请求有时会堆积起来,一旦它落后,就永远无法赶上。

UPDATE II: I ran a jstack during a crash and found about 250 (max threads) of the following: 更新II:我在崩溃期间运行了一个jstack,发现了大约250(最大线程)的以下内容:

"http-nio-443-exec-294" daemon prio=10 tid=0x00002aaabd4ed800 nid=0x5a5d in Object.wait() [0x00000000579e2000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:485)
        at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1118)
        - locked <0x0000000743116b30> (a org.apache.commons.pool.impl.GenericObjectPool$Latch)
        at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106)
        at org.apache.commons.dbcp.BasicDataSource.getConnection(BasicDataSource.java:1044)
        at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:111)
        at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:77)
        at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:573)
        at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:637)
        at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:666)
        at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:674)
        at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:718)

To my untrained eye, this looks fairly conclusive. 对于我未经训练的眼睛,这看起来相当确凿。 It looks like the database connection pool has hit its cap. 看起来数据库连接池已达到上限。 I configured a maxWait of three seconds without modifying the maxActive and maxIdle just to ensure that we begin to see exceptions logged when the pool fills up. 我在没有修改maxActive和maxIdle的情况下配置了maxWait为3秒,以确保我们开始查看池填满时记录的异常。 Then I'll set those values to something appropriate and monitor. 然后我会将这些值设置为适当的值并进行监控。

UPDATE III: After configuring maxWait, I began to see these in my logs, as expected: 更新III:配置maxWait后,我开始在我的日志中看到这些,如预期的那样:

 org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
        at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:114)
        at org.apache.commons.dbcp.BasicDataSource.getConnection(BasicDataSource.java:1044)
        at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:111)
        at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:77)

I've set maxActive to -1 (infinite) and maxIdle to 10. I will monitor for a while, but my guess is that this is the end of the problem. 我已将maxActive设置为-1(无限)并将maxIdle设置为10.我将监视一段时间,但我的猜测是这是问题的结束。

From experience, you may want to look at your database connection pool implementation. 根据经验,您可能希望查看数据库连接池实现。 It could be that your database has plenty of capacity, but the connection pool in your application is limited to a small number of connections. 可能是您的数据库具有足够的容量,但应用程序中的连接池仅限于少量连接。 I can't remember the details, but I seem to recall having a similar problem, which was one of the reasons I switched to using BoneCP , which I've found to be very fast and reliable under load tests. 我不记得细节,但我似乎记得有一个类似的问题,这是我改用BoneCP的原因之一,我发现在负载测试下它非常快速和可靠。

After trying the debugging suggested below, try increasing the number of connection available in the pool and see if that has any impact. 尝试下面建议的调试后,尝试增加池中可用的连接数,看看是否有任何影响。

I identified some server code today that may not have been threadsafe, and I put a fix in for that, but the problem is still happening (though less frequently). 我今天发现了一些可能没有线程安全的服务器代码,我为此修好了,但问题仍在发生(尽管不那么频繁)。 Is this the sort of problem that un-threadsafe code can cause? 这是非线程安全代码可能导致的问题吗?

It depends what you mean by thread-safe. 这取决于你的线程安全意味着什么。 It sounds to me as though your application is causing threads to deadlock . 听起来好像你的应用程序导致线程死锁 You might want to run your production environment with the JVM configured to allow a debugger to connect, and then use JVisualVM, JConsole or another profiling tool (YourKit is excellent IMO) to have a peek at what threads you've got, and what they're waiting on. 您可能希望运行生产环境,并将JVM配置为允许调试器连接,然后使用JVisualVM,JConsole或其他分析工具(YourKit是优秀的IMO)来查看您已获得的线程以及它们是什么等待。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM