如何分析 Java 中的线程转储以最小化高 CPU 使用率

Question

I'm trying to read text file and insert into database with Disruptor .我正在尝试使用Disruptor读取文本文件并插入到数据库中。

But I find that the CPU usage is too high (200%, according to top command).但我发现 CPU 使用率太高（根据top命令为 200%）。

I'm new to performance tuning and thread dump analysis.我是性能调优和线程转储分析的新手。 I don't know what's going wrong.我不知道出了什么问题。

So I execute top -H and find the two highest threads (both are 99%), and find the thread dump:所以我执行top -H并找到两个最高的线程（都是 99%），并找到线程转储：

"main" prio=10 tid=0x00007f54a4006800 nid=0x79ab runnable [0x00007f54a8340000]
   java.lang.Thread.State: RUNNABLE
    at java.lang.Thread.yield(Native Method)
    at com.lmax.disruptor.SingleProducerSequencer.next(SingleProducerSequencer.java:104)
    at com.lmax.disruptor.SingleProducerSequencer.next(SingleProducerSequencer.java:79)
    at com.lmax.disruptor.RingBuffer.next(RingBuffer.java:207)
    at com.xxx.xxx.connectivity.quickfixj.FixMessageReceiver.onMessage(FixMessageReceiver.java:105)
    at com.xxx.xxx.database.DatabaseService.start(DatabaseService.java:110)
    at com.xxx.xxx.database.DatabaseService.main(DatabaseService.java:168)


"pool-2-thread-1" prio=10 tid=0x00007f54a426d800 nid=0x79bc runnable [0x00007f5492a37000]
   java.lang.Thread.State: RUNNABLE
    at java.lang.Thread.yield(Native Method)
    at com.lmax.disruptor.SingleProducerSequencer.next(SingleProducerSequencer.java:104)
    at com.lmax.disruptor.SingleProducerSequencer.next(SingleProducerSequencer.java:79)
    at com.lmax.disruptor.RingBuffer.next(RingBuffer.java:207)
    at com.cimb.reporting.connectivity.jms.DatabaseEventHandler.publish2DbRingBuffer(DatabaseEventHandler.java:49)
    at com.xxx.xxx.connectivity.jms.DatabaseEventHandler.onEvent(DatabaseEventHandler.java:39)
    at com.xxx.xxx.connectivity.jms.DatabaseEventHandler.onEvent(DatabaseEventHandler.java:15)
    at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:133)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
    at java.lang.Thread.run(Thread.java:662)

Basically these two threads are going to publish data to Disruptor.基本上这两个线程将向 Disruptor 发布数据。 I create Disruptor in this way:我以这种方式创建Disruptor ：

Disruptor<TradeEvent> disruptor = new Disruptor<TradeEvent>(TradeEvent.TRADE_EVENT_FACTORY,
                properties.dbRingbufferSize(), Executors.newCachedThreadPool(),
                ProducerType.SINGLE, new BlockingWaitStrategy());

Please help me and analyze the thread dump to find the root cause of high CPU usage.请帮我分析线程转储以找到CPU使用率高的根本原因。

Answer 1

Faced exactly the same problem: 100% cpu usage on my machine Intel Core i3-3220 with 16GB memory with one broker (Xmx 2G), one client (Xmx2G) and without any message in a ringbuffer. 面临着完全相同的问题：我的机器上的Intel Core i3-3220在CPU上的使用率为100％，具有16GB内存，一个代理（Xmx 2G），一个客户端（Xmx2G），并且在环形缓冲区中没有任何消息。

Quick profiling shows that Thread.yield() consumes about 70-80% of cpu. 快速分析显示Thread.yield（）消耗了大约70-80％的cpu。

It is turned out the YieldingWaitStrategy is not a proper strategy in my case. 事实证明，就我而言，YieldingWaitStrategy不是一个适当的策略。 So in my case quick fix was to set the wait strategy into BlockingWaitStrategy: 因此，在我的情况下，快速解决方案是将等待策略设置为BlockingWaitStrategy：

Disruptor<MessageEvent> disruptor = new Disruptor<MessageEvent>(eventFactory, RING_BUFFER_SIZE, executor, ProducerType.SINGLE, new BlockingWaitStrategy());

UPDATE 更新

JProfiler for YieldingWaitStrategy JProfiler for YieldingWaitStrategy 在此处输入图片说明

Answer 2

This thread is a bit old however consider that in most recent JDK (fe Java 11) the CPU usage is exposed in the Thread Dump.这个线程有点旧，但是考虑到在最新的 JDK（fe Java 11）中，CPU 使用率在线程转储中公开。 Example:例子：

jstack -l 5213 | grep cpu
"Reference Handler" #2 daemon prio=10 os_prio=0 cpu=4.88ms elapsed=7027.22s tid=0x0000dedewfewwewe nid=0x3456 waiting on condition  [0x00007f386cc0c000]

References: Find which Java Thread is hogging your CPU参考资料：找出占用 CPU 的 Java 线程

Answer 3

You should have a look at a java profiler. 您应该看一下Java Profiler。 For example VisualVM, which will analyze your CPU and RAM use in real time. 例如VisualVM，它将实时分析您的CPU和RAM使用情况。

Answer 4

High CPU utilization is ok if there is actually some work in progress. 如果实际上正在进行一些工作，则可以提高CPU利用率。 ie if there are many live threads performing, the net CPU usage for java application will always be at its peak. 例如，如果有许多活动线程正在执行，则Java应用程序的CPU净使用率将始终处于峰值。 It is usually instantaneous, ie should get normal again when there are no tasks. 它通常是瞬时的，即在没有任务时应恢复正常。

I would suggest to: 我建议：

Take multiple thread dumps (3-4) after a fix interval (1-2 seconds) (can use kill command on linux, jstack , jvisualvm, jconsole on all systems with jdk) 在固定间隔（1-2秒）后进行多个线程转储（3-4）（可以在具有jdk的所有系统上对Linux，jstack，jvisualvm，jconsole使用kill命令）

execute ps -mo pid,lwp,stime,time,%cpu,%mem -C java | less 执行ps -mo pid,lwp,stime,time,%cpu,%mem -C java | less ps -mo pid,lwp,stime,time,%cpu,%mem -C java | less . ps -mo pid,lwp,stime,time,%cpu,%mem -C java | less 。 This will list the lightweight processes under the java application's process id. 这将在Java应用程序的进程ID下列出轻量级进程。

Get the process ids of the processes LWP with highest cpu/memory % (as targeted) 获取具有最高cpu /内存％（按目标）的进程LWP的进程ID

convert the lwp ids in to hexadecimal values, can use echo "obase=16; 255" | bc 将lwp id转换为十六进制值，可以使用echo "obase=16; 255" | bc echo "obase=16; 255" | bc

Map these hex ids as nid='some_hex_vlaue' in the thread dump to find the details of thread corresponding to high cpu usage. 在线程转储nid='some_hex_vlaue'这些十六进制ID映射为nid='some_hex_vlaue' ，以查找与CPU使用率较高相对应的线程的详细信息。
eg : "main" prio=10 tid=0x00007f54a4006800 nid=0x79ab runnable [0x00007f54a8340000] 例如：“ main” prio = 10 tid = 0x00007f54a4006800 nid=0x79ab可运行[0x00007f54a8340000]

Now we know the thread/s within the java process with highest resource(can be used for both memory/cpu%) usage. 现在我们知道java进程中具有最高资源（可同时用于内存/ cpu％）的线程。

I would also recommend to attach your jvm process to jvisualvm or jconsole and reproduce the problem, this way you can monitor your application's state at all the time (from normal to issue reproduction) and take snapshots for reference. 我还建议您将jvm进程附加到jvisualvm或jconsole并重现该问题，这样您就可以始终监视应用程序的状态（从正常到问题重现）并拍摄快照以供参考。 Both are good enough to perform any java threads or memory related profiling. 两者都足以执行任何Java线程或与内存相关的分析。
http://docs.oracle.com/javase/7/docs/technotes/guides/visualvm/ http://docs.oracle.com/javase/7/docs/technotes/guides/visualvm/
http://docs.oracle.com/javase/6/docs/technotes/guides/management/jconsole.html http://docs.oracle.com/javase/6/docs/technotes/guides/management/jconsole.html

Analyzing thread dumps can only point to the problem areas. 分析线程转储只能指向问题区域。 At times its tricky, usually with high impact, but the cause and fix will be small. 有时它很棘手，通常影响很大，但是起因和解决方法很小。 Actual cause would usually be either the way application is coded. 实际原因通常是应用程序编码的方式。 ie How concurrency is managed. 即如何管理并发。 If a process is listening infinitely instead of waiting for notification, deadlocks due to synchronization issues etc or the system environment/external interfaces. 如果某个进程正在无限侦听而不是等待通知，则会因同步问题等或系统环境/外部接口而导致死锁。 ie file r/w on disk or remote locations, transfer using a ftp APIs or may be db operations etc. 例如，文件r / w在磁盘或远程位置上，使用ftp API进行传输，或者可能是db操作等。

Here is one useful post at dzone: http://architects.dzone.com/articles/how-analyze-java-thread-dumps 这是dzone上的一篇有用的文章： http ://architects.dzone.com/articles/how-analyze-java-thread-dumps

Hope it helps. 希望能帮助到你。

如何分析 Java 中的线程转储以最小化高 CPU 使用率

问题描述

4 个解决方案

解决方案1
2 已采纳 2014-10-16 15:35:07

解决方案2
1 2021-10-26 14:41:02

解决方案3
0 2014-06-04 05:15:26

解决方案4
0 2014-06-04 05:44:37

如何分析 Java 中的线程转储以最小化高 CPU 使用率

问题描述

4 个解决方案

解决方案1 2 已采纳 2014-10-16 15:35:07

解决方案2 1 2021-10-26 14:41:02

解决方案3 0 2014-06-04 05:15:26

解决方案4 0 2014-06-04 05:44:37

解决方案1
2 已采纳 2014-10-16 15:35:07

解决方案2
1 2021-10-26 14:41:02

解决方案3
0 2014-06-04 05:15:26

解决方案4
0 2014-06-04 05:44:37