简体繁体 English

我如何在Linux性能下获得libc6符号（例如_int_malloc）的致电父母？

[英]How do I get call parents for libc6 symbols (e.g. _int_malloc) with linux perf?

原文 2012-04-18 16:00:56 8 2 c++/ linux/ g++/ profiling/ perf

I'm profiling a C++ application using linux perf, and I'm getting a nice control flow graph using GProf2dot . 我正在使用linux perf对C ++应用程序进行性能分析，并使用GProf2dot获得了不错的控制流程图。 However, some symbols from the C library (libc6-2.13.so) take a substantial portion of the total time, and yet have no in-edges. 但是，来自C库（libc6-2.13.so）的某些符号占用了总时间的很大一部分，但没有边界。

For example: 例如：

_int_malloc takes 8% of the time but has no call parents. _int_malloc花费8％的时间，但没有通话父母。
__strcmp_sse42 and __cxxabiv1::__si_class_type_info::__do_dyncast together take about 10% of the time, and have a caller whose name is 0 , which has callers 2d6935c , 2cc748c , and 6 , which have no callers. __strcmp_sse42和__cxxabiv1::__si_class_type_info::__do_dyncast总共花费大约10％的时间，并且有一个呼叫者，其名称为0 ，其中呼叫者2d6935c ， 2cc748c和6没有呼叫者。

As a result, I can't find out which routines are responsible for all this mallocing and dynamic casting using just perf. 结果，我无法找出仅使用perf负责所有这些分配和动态转换的例程。 However, it seems that other symbols (eg malloc but not _int_malloc ) do have call parents. 但是，似乎其他符号（例如malloc但没有_int_malloc ）确实具有调用父级。

Why doesn't perf show call parents for _int_malloc? 为什么perf不显示_int_malloc的通话父母？ Why can't I find the ultimate callers of __do_dyn_cast? 为什么找不到__do_dyn_cast的最终调用者？ And, is there some way for me to modify my setup so that I can get this information? 而且，我是否可以通过某种方式修改设置，以便获得此信息？ I'm on x86-64, so I'm wondering if I need a (non-standard) libc6 with frame pointers. 我使用的是x86-64，所以我想知道是否需要带帧指针的（非标准）libc6。

2 个解决方案

Update: As of the 3.7.0 kernel, one can determine call parents of symbols in system libraries using perf record -gdwarf <command> . 更新：从3.7.0内核开始，可以使用perf record -gdwarf <command>确定系统库中符号的调用父级。

Using -gdwarf , there is no need to compile with -fno-omit-frame-pointer . 使用-gdwarf ，无需使用-fno-omit-frame-pointer进行编译。

Original answer: Yes, one probably would need a libc6 compiled with frame pointers ( -fno-omit-framepointer ) on x86_64, at the moment (May 24, 2012). 原始答案：是的，目前（2012年5月24日），可能需要在x86_64上使用帧指针（ -fno-omit-framepointer ）编译libc6。

However, developers are currently working on allowing the perf tools to use DWARF unwind info. 但是，开发人员当前正在努力允许perf工具使用DWARF展开信息。 This means that frame pointers are no longer needed to get backtrace information on x86_64. 这意味着不再需要帧指针来获取x86_64上的回溯信息。 Linus, however, does not want a DWARF unwinder in the kernel. 但是，Linus不希望在内核中放开DWARF。 Thus, the perf tools will save registers as the system is running, and perform the DWARF unwinding in the userspace perf tool using the libunwind library. 因此，性能工具将在系统运行时保存寄存器，并使用libunwind库在用户空间性能工具中执行DWARF展开。

This technique has been tested to successfully determine callers of (for example) malloc and dynamic_cast . 已经对该技术进行了测试，可以成功确定（例如） malloc和dynamic_cast调用方。 However, the patch set is not yet integrated into the Linux kernel, and needs to undergo further revision before it is ready. 但是，补丁集尚未集成到Linux内核中，因此需要进行进一步的修订才能准备就绪。

_int_malloc and __do_dyn_cast are being called from routines that the profiler can't identify because it doesn't have symbol table information for them. 从探查器无法识别的例程中调用_int_malloc和__do_dyn_cast ，因为该例程没有符号表信息。

What's more, it looks like you are showing self (exclusive) time . 而且，看起来您正在显示自己的（专有）时间 。 That is only useful for finding hotspots in routines that a) have much self time, and b) you can fix. 这仅对于在以下情况的例程中找到热点很有用：a）自我时间很多，b）您可以修复。

There's a reason profilers subsequent to the original unix profil were created. 还有后续原来的UNIX剖析原因profil创建。 Real software consists of functions that spend nearly all their time calling other functions, and you need to be able to find code that is on the stack much of the time, not that has the program counter much of the time. 真正的软件由几乎花费所有时间调用其他函数的函数组成，并且您需要能够在很多时间找到堆栈中的代码，而不是在很多时候找到程序计数器。

So you need to configure perf to take stack samples and tell you the percent of time each of your routines is on the stack. 所以，你需要配置perf采取堆样品，并告诉你的时间的百分比每个例程是在栈上。 It is even better if it reports not just routines, but lines of code, as in Zoom . 如果它不仅报告例程，还报告代码行，那就更好了，例如Zoom 。 It is best to take the samples on wall-clock time, so you're not blind to IO. 最好在壁钟时间采样，这样您就不会对IO视而不见。