简体   繁体   English

java应用程序中的高CPU利用率 - 为什么?

[英]High CPU Utilization in java application - why?

I have a Java Application (web-based) that at times shows very high CPU Utilization (almost 90%) for several hours.我有一个 Java 应用程序(基于 Web),它有时会在几个小时内显示出非常高的 CPU 利用率(几乎 90%)。 Linux TOP command shows this. Linux TOP命令显示了这一点。 On application restart, the problem goes away.在应用程序重新启动时,问题就消失了。

So to investigate :所以要调查

I take Thread Dump to find what threads are doing.我使用线程转储来查找线程在做什么。 Several Threads are found in 'RUNNABLE' state, some in few other states.几个线程处于'RUNNABLE'状态,一些处于其他几个状态。 On taking repeated Thread Dumps, i do see some threads that are always present in 'RUNNABLE' state.在进行重复的线程转储时,我确实看到了一些始终处于'RUNNABLE'状态的线程。 So, they appear to be the culprit.因此,他们似乎是罪魁祸首。

But I am unable to tell for sure, which Thread is hogging the CPU or has gone into a infinite loop (thereby causing high CPU util).但我无法确定哪个线程正在占用 CPU 或进入无限循环(从而导致 CPU 利用率过高)。

Logs don't necessarily help, as the offending code may not be logging anything.日志不一定有帮助,因为违规代码可能不会记录任何内容。

How do I investigate - What part of the application or what-thread is causing High CPU Utilization?我如何调查 - 应用程序的哪个部分或哪个线程导致 CPU 利用率高? - Any other ideas? - 还有其他想法吗?

If a profiler is not applicable in your setup, you may try to identify the thread following steps in this post .如果探查并不适用于您的设置,您可以尝试找出以下步骤线程这个职位

Basically, there are three steps:基本上,分为三个步骤:

  1. run top -H and get PID of the thread with highest CPU.运行top -H并获取 CPU 最高的线程的 PID。
  2. convert the PID to hex.将PID转换为十六进制。
  3. look for thread with the matching HEX PID in your thread dump.在您的线程转储中查找具有匹配 HEX PID 的线程。

You may be victim of a garbage collection problem.您可能是垃圾收集问题的受害者。

When your application requires memory and it's getting low on what it's configured to use the garbage collector will run often which consume a lot of CPU cycles.当您的应用程序需要内存并且它配置为使用的内存越来越低时,垃圾收集器将经常运行,这会消耗大量 CPU 周期。 If it can't collect anything your memory will stay low so it will be run again and again.如果它无法收集任何东西,您的内存将保持低位,因此它将一次又一次地运行。 When you redeploy your application the memory is cleared and the garbage collection won't happen more than required so the CPU utilization stays low until it's full again.当您重新部署您的应用程序时,内存会被清除,垃圾收集不会超过需要的次数,因此 CPU 利用率会保持在较低水平,直到它再次被填满。

You should check that there is no possible memory leak in your application and that it's well configured for memory (check the -Xmx parameter, see What does Java option -Xmx stand for? )您应该检查您的应用程序中是否没有可能的内存泄漏,并且它的内存配置良好(检查-Xmx参数,请参阅Java 选项 -Xmx 代表什么?

Also, what are you using as web framework?另外,您使用什么作为Web框架? JSF relies a lot on sessions and consumes a lot of memory, consider being stateless at most! JSF 非常依赖会话并消耗大量内存,最多考虑无状态!

In the thread dump you can find the Line Number as below.在线程转储中,您可以找到如下所示的行号。

for the main thread which is currently running...对于当前正在运行的主线程...

"main" #1 prio=5 os_prio=0 tid=0x0000000002120800 nid=0x13f4 runnable [0x0000000001d9f000]
   java.lang.Thread.State: **RUNNABLE**
    at java.io.FileOutputStream.writeBytes(Native Method)
    at java.io.FileOutputStream.write(FileOutputStream.java:313)
    at com.rana.samples.**HighCPUUtilization.main(HighCPUUtilization.java:17)**

During these peak CPU times, what is the user load like?在这些 CPU 高峰期,用户负载如何? You say this is a web based application, so the culprits that come to mind is memory utilization issues.你说这是一个基于 web 的应用程序,所以想到的罪魁祸首是内存利用率问题。 If you store a lot of stuff in the session, for instance, and the session count gets high enough, the app server will start thrashing about.例如,如果您在会话中存储了大量内容,并且会话计数变得足够高,则应用服务器将开始颠簸。 This is also a case where the GC might make matters worse depending on the scheme you are using.这也是 GC 可能会使事情变得更糟的情况,具体取决于您使用的方案。 More information about the app and the server configuration would be helpful in pointing towards more debugging ideas.有关应用程序和服务器配置的更多信息将有助于指出更多调试想法。

Your first approach should be to find all references to Thread.sleep and check that:您的第一种方法应该是找到对Thread.sleep 的所有引用并检查:

  1. Sleeping is the right thing to do - you should use some sort of wait mechanism if possible - perhaps careful use of a BlockingQueue would help.睡觉是正确的做法——如果可能,你应该使用某种等待机制——也许小心使用BlockingQueue会有所帮助。

  2. If sleeping is the right thing to do, are you sleeping for the right amount of time - this is often a very difficult question to answer.如果睡觉正确的事情,那么您的睡眠时间是否合适——这通常是一个很难回答的问题。

The most common mistake in multi-threaded design is to believe that all you need to do when waiting for something to happen is to check for it and sleep for a while in a tight loop.多线程设计中最常见的错误是认为在等待某事发生时您需要做的就是检查它并在紧密循环中休眠一段时间。 This is rarely an effective solution - you should always try to wait for the occurrence.这很少是一个有效的解决方案 - 您应该始终尝试wait发生。

The second most common issue is to loop without sleeping .第二个最常见的问题是循环而不睡觉 This is even worse and is a little less easy to track down.这甚至更糟,并且不太容易追踪。

Flame graphs can be helpful in identifying the execution paths that are consuming the most CPU time.火焰图有助于识别消耗最多 CPU 时间的执行路径。

In short, the following are the steps to generate flame graphs总之,下面是生成火焰图的步骤

yum -y install perf

wget https://github.com/jvm-profiling-tools/async-profiler/releases/download/v1.8.3/async-profiler-1.8.3-linux-x64.tar.gz

tar -xvf async-profiler-1.8.3-linux-x64.tar.gz
chmod -R 777 async-profiler-1.8.3-linux-x64
cd async-profiler-1.8.3-linux-x64

echo 1 > /proc/sys/kernel/perf_event_paranoid
echo 0 > /proc/sys/kernel/kptr_restrict

JAVA_PID=`pgrep java`

./profiler.sh -d 30 $JAVA_PID -f flame-graph.svg

flame-graph.svg can be opened using browsers as well, and in short, the width of the element in stack trace specifies the number of thread dumps that contain the execution flow relatively.也可以使用浏览器打开flame-graph.svg ,简而言之,stack trace中元素的宽度指定了相对包含执行流程的线程转储数量。

There are few other approaches to generating them生成它们的其他方法很少

  • By introducing -XX:+PreserveFramePointer as the JVM options as described here通过引入-XX:+PreserveFramePointer作为此处描述的 JVM 选项
  • Using async-profiler with -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints as described here使用异步廓与-XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints描述这里

But using async-profiler without providing any options though not very accurate, can be leveraged with no changes to the running Java process with low CPU overhead to the process.但是在不提供任何选项的情况下使用 async-profiler 虽然不是很准确,但可以在不更改正在运行的 Java 进程的情况下利用该进程的 CPU 开销低。

Their wiki provides details on how to leverage it.他们的wiki提供了有关如何利用它的详细信息。 And more about flame graphs can be found here可以在此处找到有关火焰图的更多信息

You did not assign the "linux" to the question but you mentioned "Linux top".您没有将“linux”分配给问题,但您提到了“Linux top”。 And thus this might be helpful:因此,这可能会有所帮助:

Use the small Linux tool threadcpu to identify the most cpu using threads.使用Linux 小工具threadcpu 找出使用线程最多的cpu。 It calls jstack to get the thread name.它调用 jstack 来获取线程名称。 And with "sort -n" in pipe you get the list of threads ordered by cpu usage.使用管道中的“sort -n”,您可以获得按 CPU 使用情况排序的线程列表。

More details can be found here: http://www.tuxad.com/blog/archives/2018/10/01/threadcpu_-_show_cpu_usage_of_threads/index.html更多细节可以在这里找到: http : //www.tuxad.com/blog/archives/2018/10/01/threadcpu_-_show_cpu_usage_of_threads/index.html

And if you still need more details then create a thread dump or run strace on the thread.如果您仍然需要更多详细信息,请创建线程转储或在线程上运行 strace。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM