Tomcat停止响应JK请求

Question

I have a nasty issue with load-balanced Tomcat servers that are hanging up. 我对挂起的负载平衡Tomcat服务器有一个讨厌的问题。 Any help would be greatly appreciated. 任何帮助将不胜感激。

The system 系统

I'm running Tomcat 6.0.26 on HotSpot Server 14.3-b01 (Java 1.6.0_17-b04) on three servers sitting behind another server that acts as load balancer. 我在位于另一台充当负载平衡器的服务器后面的三台服务器上的HotSpot Server 14.3-b01（Java 1.6.0_17-b04）上运行Tomcat 6.0.26。 The load balancer runs Apache (2.2.8-1) + MOD_JK (1.2.25). 负载平衡器运行Apache（2.2.8-1）+ MOD_JK（1.2.25）。 All of the servers are running Ubuntu 8.04. 所有服务器都在运行Ubuntu 8.04。

The Tomcat's have 2 connectors configured: an AJP one, and a HTTP one. Tomcat的配置有2个连接器：一个AJP和一个HTTP。 The AJP is to be used with the load balancer, while the HTTP is used by the dev team to directly connect to a chosen server (if we have a reason to do so). AJP将与负载均衡器一起使用，而HTTP由开发团队使用以直接连接到选定的服务器（如果有理由的话）。

I have Lambda Probe 1.7b installed on the Tomcat servers to help me diagnose and fix the problem soon to be described. 我在Tomcat服务器上安装了Lambda Probe 1.7b，可以帮助我诊断和解决即将描述的问题。

The problem 问题

Here's the problem: after about 1 day the application servers are up, JK Status Manager starts reporting status ERR for, say, Tomcat2. 问题出在这里：应用服务器启动大约1天后，JK Status Manager开始报告Tomcat2的状态ERR 。 It will simply get stuck on this state, and the only fix I've found so far is to ssh the box and restart Tomcat. 它只会卡在这种状态，到目前为止，我发现的唯一解决方法是ssh并重新启动Tomcat。

I must also mention that JK Status Manager takes a lot longer to refresh when there's a Tomcat server in this state. 我还必须提到，当有Tomcat服务器处于这种状态时，JK状态管理器需要花费很多时间来刷新。

Finally, the "Busy" count of the stuck Tomcat on JK Status Manager is always high, and won't go down per se -- I must restart the Tomcat server, wait, then reset the worker on JK. 最后，在JK Status Manager上卡住的Tomcat的“忙碌”计数始终很高，并且本身不会下降-我必须重新启动Tomcat服务器，等待，然后在JK上重置工作服务器。

Analysis 分析

Since I have 2 connectors on each Tomcat (AJP and HTTP), I still can connect to the application through the HTTP one. 由于每个Tomcat（AJP和HTTP）上都有2个连接器，因此我仍然可以通过HTTP连接到应用程序。 The application works just fine like this, very, very fast. 这样的应用程序运行得很好，非常非常快。 That is perfectly normal, since I'm the only one using this server (as JK stopped delegating requests to this Tomcat). 这是完全正常的，因为我是唯一使用此服务器的服务器（因为JK停止将请求委派给该Tomcat）。

To try to better understand the problem, I've taken a thread dump from a Tomcat which is not responding anymore, and from another one that has been restarted recently (say, 1 hour before). 为了更好地理解问题，我从不再响应的Tomcat和最近重新启动的另一个（例如1小时前）重新启动的Tomcat中提取了一个线程转储。

The instance that is responding normally to JK shows most of the TP-ProcessorXXX threads in "Runnable" state, with the following stack trace: 正常响应JK的实例显示大多数TP-ProcessorXXX线程处于“可运行”状态，并具有以下堆栈跟踪：

java.net.SocketInputStream.socketRead0 ( native code )
java.net.SocketInputStream.read ( SocketInputStream.java:129 )
java.io.BufferedInputStream.fill ( BufferedInputStream.java:218 )
java.io.BufferedInputStream.read1 ( BufferedInputStream.java:258 )
java.io.BufferedInputStream.read ( BufferedInputStream.java:317 )
org.apache.jk.common.ChannelSocket.read ( ChannelSocket.java:621 )
org.apache.jk.common.ChannelSocket.receive ( ChannelSocket.java:559 )
org.apache.jk.common.ChannelSocket.processConnection ( ChannelSocket.java:686 )
org.apache.jk.common.ChannelSocket$SocketConnection.runIt ( ChannelSocket.java:891 )
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run ( ThreadPool.java:690 )
java.lang.Thread.run ( Thread.java:619 )

The instance that is stuck shows most (all?) of the TP-ProcessorXXX threads in "Waiting" state. 卡住的实例显示大多数（全部？）TP-ProcessorXXX线程处于“等待”状态。 These have the following stack trace: 这些具有以下堆栈跟踪：

java.lang.Object.wait ( native code )
java.lang.Object.wait ( Object.java:485 )
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run ( ThreadPool.java:662 )
java.lang.Thread.run ( Thread.java:619 )

I don't know of the internals of Tomcat, but I would infer that the "Waiting" threads are simply threads sitting on a thread pool. 我不了解Tomcat的内部原理，但我可以推断出“等待”线程只是位于线程池上的线程。 So, if they are threads waiting inside of a thread pool, why wouldn't Tomcat put them to work on processing requests from JK? 因此，如果它们是在线程池中等待的线程，那么Tomcat为什么不将其用于处理来自JK的请求？

EDIT: I don't know if this is normal, but Lambda Probe shows me, in the Status section, that there are lots of threads in KeepAlive state. 编辑：我不知道这是否正常，但是Lambda探针在状态部分向我显示，有许多线程处于KeepAlive状态。 Is this somehow related to the problem I'm experiencing? 这与我遇到的问题有某种关系吗？

Solution? 解？

So, as I've stated before, the only fix I've found is to stop the Tomcat instance, stop the JK worker, wait the latter's busy count slowly go down, start Tomcat again, and enable the JK worker once again. 因此，正如我之前所说的，我发现的唯一解决方法是停止Tomcat实例，停止JK worker，等待后者的忙碌计数逐渐下降，再次启动Tomcat，然后再次启用JK worker。

What is causing this problem? 是什么导致此问题？ How should I further investigate it? 我应该如何进一步调查？ What can I do to solve it? 我该怎么解决？

Thanks in advance. 提前致谢。

Answer 1

Do you have JVM memory settings and garbage collection configured? 您是否配置了JVM内存设置和垃圾回收？ You would do this where you set your CATALINA_OPTS 您可以在设置CATALINA_OPTS的位置执行此操作

examples: 例子：

CATALINA_OPTS="$CATALINA_OPTS -server -Xnoclassgc -Djava.awt.headless=true"
CATALINA_OPTS="$CATALINA_OPTS -Xms1024M -Xmx5120M -XX:MaxPermSize=256m"
CATALINA_OPTS="$CATALINA_OPTS -XX:-UseParallelGC"
CATALINA_OPTS="$CATALINA_OPTS -Xnoclassgc"

There are multiple philosophies on which GC setting is best. GC设置是最佳的多种哲学。 It depends on the kind of code that you are executing. 这取决于您正在执行的代码类型。 The config above worked best for a JSP-intensive environment (taglibs instead of MVC framework). 上面的配置最适合JSP密集型环境（使用标记库而不是MVC框架）。

Answer 2

Check your keepalive time setting. 检查您的Keepalive时间设置。 It seems you are getting threads into keepalive state, and they don't time out. 看来您正在使线程进入保持活动状态，并且它们不会超时。 It appears your server is not detecting client disconnects within a reasonable time. 看来您的服务器没有在合理的时间内检测到客户端断开连接。 There are several timeout and count variables involved. 有几个超时和计数变量。

Answer 3

Check your log file first. 首先检查您的日志文件。

I think the default log file is located in /var/log/daemon.log. 我认为默认日志文件位于/var/log/daemon.log中。 (this file does not contains only the logs from tomcat) （此文件不只包含来自tomcat的日志）

Answer 4

I've had a similar problem with Weblogic. Weblogic也有类似的问题。 The cause was that too many threads were waiting for network responses and Weblogic was running out of memory. 原因是太多的线程正在等待网络响应，而Weblogic的内存不足。 Tomcat probably behaves the same way. Tomcat的行为可能与此相同。 Things you can try are: 您可以尝试的操作有：

Decrease the timeout value of your connections. 减少连接的超时值。
Decrease the total amount of simultaneous connections, so that tomcat doesn't start new threads when that amount is reached. 减少同时连接的总数，以使tomcat在达到该数量时不会启动新线程。
Easy fix, but doesn't correct the root cause: It might be that tomcat is in out of memory state, even though it's not showing up in the logs yet. 易于修复，但不能纠正根本原因：可能是tomcat处于内存不足状态，即使它尚未显示在日志中也是如此。 Increase tomcat's memory like previously described. 如前所述增加tomcat的内存。

Tomcat停止响应JK请求

问题描述

The system 系统

The problem 问题

Analysis 分析

Solution? 解？

4 个解决方案

解决方案1
3 2010-06-15 18:24:33

解决方案2
2 2010-06-05 20:44:54

解决方案3
1 2010-05-26 12:11:50

解决方案4
1 2010-06-18 10:27:33

Tomcat停止响应JK请求

问题描述

The system 系统

The problem 问题

Analysis 分析

Solution? 解？

4 个解决方案

解决方案1 3 2010-06-15 18:24:33

解决方案2 2 2010-06-05 20:44:54

解决方案3 1 2010-05-26 12:11:50

解决方案4 1 2010-06-18 10:27:33

解决方案1
3 2010-06-15 18:24:33

解决方案2
2 2010-06-05 20:44:54

解决方案3
1 2010-05-26 12:11:50

解决方案4
1 2010-06-18 10:27:33