简体   繁体   English

群集生产服务器中的性能问题

[英]Performance issue in clustered Production Servers

Our production users complain of performance issue at least two to three times every month. 我们的生产用户每月至少抱怨两到三次性能问题。 We have IBM WAS 8 servers in production. 我们有生产中的IBM WAS 8服务器。 The application uses two SOAP based services say H and T. H is deployed on INTERNET clustered servers (X, Y). 该应用程序使用两个基于SOAP的服务,分别称为H和T。H部署在INTERNET群集服务器(X,Y)上。 T is deployed on INTRANET servers (U, V). T部署在INTRANET服务器(U,V)上。 Client directly connects to H. H connects to T on INTRANET . 客户端直接连接到H。H连接到INTRANET上的T。 Both the SOAP based services H,T connects to a database. 两种基于SOAP的服务H,T都连接到数据库。 Also, there is a service for Authenticating users. 另外,还有用于验证用户的服务。 We are not seeing any errors in the logs of server U and V. But logs of H on server X, Y gives following error. 我们没有在服务器U和V的日志中看到任何错误。但是在服务器X,Y上的H日志给出了以下错误。 Different error at different times: 在不同时间出现不同的错误:

1. java.net.SocketTimeoutException: Socket operation timed out before it could be completed
2. java.io.IOException: Connection close: Read failed.  Possible end of stream encountered.  
java.lang.OutOfMemoryError: GC overhead limit exceeded
3. Exception - User fault processing is not supported. The @WebFault faultbean is missing for java.rmi.RemoteException
4. Authentication failed

We are thinking of increasing the heap size. 我们正在考虑增加堆大小。 But, before doing that what performance parameters we should collect from server to narrow root cause of issue 但是,在执行此操作之前,我们应该从服务器收集哪些性能参数以缩小问题的根本原因

As a first step, you should always monitor the underlying system (hardware server, VM, container) key performance resources - CPU utilization, free memory, network usage, etc. If your box is running out of CPU cycles or free RAM, app server performance will suffer. 第一步,您应始终监视基础系统(硬件服务器,VM,容器)的关键性能资源-CPU利用率,可用内存,网络使用情况等。如果您的设备用完了CPU周期或可用RAM,则应用服务器性能会受到影响。

As the next layer, there are various performance metrics provided by Java and by WAS which can help diagnose an issue like yours. 下一步,Java和WAS提供了各种性能指标,可以帮助诊断像您这样的问题。 A useful guide to WAS performance investigation is the WebSphere Application Server Performance Cookbook https://publib.boulder.ibm.com/httpserv/cookbook/ WAS性能调查的有用指南是《 WebSphere Application Server性能手册》 https://publib.boulder.ibm.com/httpserv/cookbook/
In your case probably this section is most applicable: https://publib.boulder.ibm.com/httpserv/cookbook/Recipes-WAS_Traditional_Recipes-General_WAS_Traditional_Performance_Problem.html 对于您而言,此部分可能最适用: https : //publib.boulder.ibm.com/httpserv/cookbook/Recipes-WAS_Traditional_Recipes-General_WAS_Traditional_Performance_Problem.html

One of the errors in your list is an OOM thrown due to "GC overhead limit exceeded". 列表中的错误之一是由于“超出了GC开销限制”而引发了OOM。 This means that the server JVM ran critically low on free space in the java heap, so that it was spending almost all its time running Java garbage collection trying to free space to do real work. 这意味着服务器JVM的Java堆可用空间严重不足,因此它几乎所有时间都在运行Java垃圾回收上试图释放空间以进行实际工作。 This type of problem can cause other problems you listed, such as timeouts and communication failures. 这种类型的问题可能会导致您列出的其他问题,例如超时和通信失败。

To diagnose an excessive GC issue, you need verbose GC logging - enabling verbose GC is step #2 in the second link above, also explained at http://www-01.ibm.com/support/docview.wss?uid=swg21114927 Verbose GC logging is very low overhead and has very high diagnostic value, so it should be enabled at all times, including in production environments. 要诊断过多的GC问题,您需要详细的GC日志记录-启用详细的GC是上面第二个链接中的步骤#2,也在http://www-01.ibm.com/support/docview.wss?uid=swg21114927中进行了说明详细GC日志记录的开销非常低,并且具有很高的诊断价值,因此应始终启用它,包括在生产环境中。

The most critical information from the GC log is how much free tenure heap is available after each global GC. GC日志中最关键的信息是每个全局GC之后有多少可用的保有权堆。 This should be at least 30% of the total tenure heap size, or the JVM will have to do too much GC work to clear space for the 'real work' you want your server to perform. 这至少应为总使用期限堆大小的30%,否则JVM将不得不做大量的GC工作以清除空间以供服务器执行“实际工作”。 The "GC overhead limit exceeded" error typically arises in configs when there is less than 10% free tenure space on a busy server. 当繁忙的服务器上的可用任用空间少于10%时,配置中通常会出现“超出GC开销限制”错误。

If a server is consistently running at less than 30% free tenure space after global GC, you need to either increase the heap size or shift some workload off the server. 如果在全局GC之后服务器始终以少于30%的可用权空间运行,则需要增加堆大小或将某些工作负载移出服务器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM