简体   繁体   English

如何在 Linux 上分析多线程 C++ 应用程序?

[英]How to profile multi-threaded C++ application on Linux?

I used to do all my Linux profiling with gprof .我曾经用gprof做我所有的 Linux 分析。

However, with my multi-threaded application, it's output appears to be inconsistent.但是,对于我的多线程应用程序,它的输出似乎不一致。

Now, I dug this up:现在,我挖出了这个:

http://sam.zoy.org/writings/programming/gprof.html http://sam.zoy.org/writings/programming/gprof.html

However, it's from a long time ago and in my gprof output, it appears my gprof is listing functions used by non-main threads.然而,它是很久以前的,在我的 gprof 输出中,我的 gprof 似乎列出了非主线程使用的函数。

So, my questions are:所以,我的问题是:

  1. In 2010, can I easily use gprof to profile multi-threaded Linux C++ applications?在 2010 年,我可以轻松地使用 gprof 来分析多线程 Linux C++ 应用程序吗? ( Ubuntu 9.10 ) ( Ubuntu 9.10 )
  2. What other tools should I look into for profiling?我应该使用哪些其他工具进行分析?

Edit: added another answer on poor man's profiler, which IMHO is better for multithreaded apps.编辑:在穷人的分析器上添加了另一个答案,恕我直言,它更适合多线程应用程序。

Have a look at oprofile .看看oprofile The profiling overhead of this tool is negligible and it supports multithreaded applications---as long as you don't want to profile mutex contention (which is a very important part of profiling multithreaded applications)这个工具的分析开销可以忽略不计,它支持多线程应用程序——只要你不想分析互斥争用(这是分析多线程应用程序的一个非常重要的部分)

Have a look at poor man's profiler .看看穷人的分析器 Surprisingly there are few other tools that for multithreaded applications do both CPU profiling and mutex contention profiling, and PMP does both, while not even requiring to install anything (as long as you have gdb).令人惊讶的是,对于多线程应用程序,很少有其他工具可以同时进行 CPU 分析和互斥争用分析,而 PMP 可以同时进行,甚至不需要安装任何东西(只要您有 gdb)。

看看Valgrind

看看Zoom

A Paul R said, have a look at Zoom. A Paul R 说,看看 Zoom。 You can also use lsstack , which is a low-tech approach but surprisingly effective, compared to gprof .您还可以使用lsstackgprof相比,这是一种技术含量低但非常有效的方法。

Added: Since you clarified that you are running OpenGL at 33ms, my prior recommendation stands.补充:由于您澄清您在 33 毫秒运行 OpenGL,因此我之前的建议成立。 In addition, what I personally have done in situations like that is both effective and non-intuitive.此外,我个人在这种情况下所做的既有效又不直观。 Just get it running with a typical or problematic workload, and just stop it, manually, in its tracks, and see what it's doing and why.只需让它在典型或有问题的工作负载下运行,然后手动停止它,看看它在做什么以及为什么。 Do this several times.这样做几次。 Now, if it only occasionally misbehaves, you would like to stop it only while it's misbehaving.现在,如果它只是偶尔行为不端,您只想在它行为不端时停止它。 That's not easy, but I've used an alarm-clock interrupt set for just the right delay.这并不容易,但我使用了一个闹钟中断来设置恰到好处的延迟。 For example, if one frame out of 100 takes more than 33ms, at the start of a frame, set the timer for 35ms, and at the end of a frame, turn it off.例如,如果 100 帧中有一个超过 33 毫秒,则在帧开始时将计时器设置为 35 毫秒,并在帧结束时将其关闭。 That way, it will interrupt only when the code is taking too long, and it will show you why.这样,它只会在代码花费太长时间时中断,并且会告诉你原因。 Of course, one sample might miss the guilty code, but 20 samples won't miss it.当然,一个样本可能会漏掉有罪代码,但 20 个样本不会漏掉。

Try modern linux profiling tool, the perf (perf_events): https://perf.wiki.kernel.org/index.php/Tutorial and http://www.brendangregg.com/perf.html :尝试现代 linux 分析工具perf (perf_events): https ://perf.wiki.kernel.org/index.php/Tutorial 和http://www.brendangregg.com/perf.html

perf record ./application
# generates profile file perf.data
perf report

You can randomly run pstack to find out the stack at a given point.您可以随机运行pstack以找出给定点的堆栈。 Eg 10 or 20 times.例如 10 或 20 次。 The most typical stack is where the application spends most of the time (according to experience, we can assume a Pareto distribution).最典型的堆栈是应用程序花费大部分时间的地方(根据经验,我们可以假设为帕累托分布)。

You can combine that knowledge with strace or truss (Solaris) to trace system calls, and pmap for the memory print.您可以将这些知识与stracetruss (Solaris) 结合起来跟踪系统调用,并使用pmap进行内存打印。

If the application runs on a dedicated system, you have also sar to measure cpu, memory, i/o, etc. to profile the overall system.如果应用程序在专用系统上运行,您还需要sar来测量 cpu、内存、i/o 等,以分析整个系统。

Since you didn't mention non-commercial, may I suggest Intel's VTune.既然你没有提到非商业,我可以建议英特尔的VTune。 It's not free but the level of detail is very impressive (and the overhead is negligible).它不是免费的,但细节水平非常令人印象深刻(而且开销可以忽略不计)。

Microprofile is another possible answer to this. Microprofile是另一个可能的答案。 It requires hand-instrumentation of the code, but it seems like it handles multi-threaded code pretty well.它需要对代码进行手动检测,但它似乎可以很好地处理多线程代码。 And it also has special hooks for profiling graphics pipelines, including what's going on inside the card itself.它还具有用于分析图形管道的特殊钩子,包括卡本身内部发生的事情。

Putting a slightly different twist on matters, you can actually get a pretty good idea as to what's going on in a multithreaded application using ftrace and kernelshark.稍微改变一下问题,您实际上可以很好地了解使用 ftrace 和 kernelshark 的多线程应用程序中发生的情况。 Collecting the right trace and pressing the right buttons and you can see the scheduling of individual threads.收集正确的跟踪并按正确的按钮,您可以看到各个线程的调度。

Depending on your distro's kernel you may have to build a kernel with the right configuration (but I think that a lot of them have it built in these days).根据您的发行版内核,您可能必须构建一个具有正确配置的内核(但我认为现在很多人都构建了它)。

I tried valgrind and gprof .我试过valgrindgprof It is a crying shame that none of them work well with multi-threaded applications.令人遗憾的是,它们都不适用于多线程应用程序。 Later, I found Intel VTune Amplifier .后来,我找到了Intel VTune Amplifier The good thing is, it handles multi-threading well, works with most of the major languages, works on Windows and Linux, and has many great profiling features.好消息是,它可以很好地处理多线程,适用于大多数主要语言,适用于 Windows 和 Linux,并且具有许多出色的分析功能。 Moreover, the application itself is free.此外,应用程序本身是免费的。 However, it only works with Intel processors.但是,它仅适用于英特尔处理器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM