简体   繁体   English

是否可以在C ++代码中使用Linux Perf profiler?

[英]Is it possible to use Linux Perf profiler inside C++ code?

I would like to measure L1, L2 and L3 Cache hit/miss ratio of some parts of my C++ code. 我想测量我的C ++代码的某些部分的L1,L2和L3缓存命中/未命中率。 I am not interested to use Perf for my entire application. 我对整个应用程序使用Perf不感兴趣。 Can Perf be used as a library inside C++? Perf可以用作C ++中的库吗?

int main() {
    ...
    ...
    start_profiling()
    // The part I'm interested in
    ...
    end_profiling()
    ...
    ...
}

I gave Intel PCM a shot, but I had two issues with it. 我给了英特尔PCM一个镜头,但我有两个问题。 First, it gave me some strange numbers . 首先,它给了我一些奇怪的数字 Second, it doesn't support L1 Cache profiling. 其次,它不支持L1 Cache分析。

If it's not possible with Perf, what is the easiest way to get that information? 如果使用Perf无法实现,那么获取该信息的最简单方法是什么?

Sounds like all you're trying to do is read a few perf counters, something that the PAPI library is ideal for. 听起来你要做的就是阅读几个性能计数器,这是PAPI库非常适合的。

Example. 例。

The full list of supported counters is quite long, but it sounds like you're most interested in PAPI_L1_TCM , PAPI_L1_TCA , and their L2 and L3 counterparts. 支持的计数器完整列表很长,但听起来你最感兴趣的是PAPI_L1_TCMPAPI_L1_TCA以及它们的L2L3对应物。 Note that you can also break down the accesses into reads/writes, and you can distinguish instruction and data caches. 请注意,您还可以将访问分解为读/写,并且可以区分指令和数据缓存。

Yes, there is special per-thread monitoring which allows to read perf counters from within userspace. 是的,有特殊的每线程监控,允许从用户空间内读取perf计数器。 See manual page for perf_event_open(2) 参见perf_event_open(2)手册页perf_event_open(2)

Since perf supports only L1i, L1d, and last-level cache events, you'll need to use PERF_EVENT_RAW mode and use numbers from manual onto your CPU. 由于perf仅支持L1i,L1d和最后一级缓存事件,因此您需要使用PERF_EVENT_RAW模式并将手动数字用于CPU。

To implement a profiling, you'll need to setup sample_interval , poll / select fd or wait for SIGIO signal, and when it happens, read sample and instruction pointer from it. 要实现分析,您需要设置sample_intervalpoll / select fd或等待SIGIO信号,当它发生时,从中读取样本和指令指针。 You'll may latter try to resolve returned instruction pointers to a function names using a debugger like GDB. 您可能会尝试使用像GDB这样的调试器来解析返回到函数名的指针指针。


Another option is to use SystemTap . 另一种选择是使用SystemTap You'll need empty implementation of start|end_profiling() , just to enable SystemTap profiling with something like that: 你需要空的start|end_profiling() ,只是为了使用类似的东西启用SystemTap分析:

global traceme, prof;

probe process("/path/to/your/executable").function("start_profiling") {
    traceme = 1;
}

probe process("/path/to/your/executable").function("end_profiling") {
    traceme = 0;
}

probe perf.type(4).config(/* RAW value of perf event */).sample(10000) {
    prof[usymname(uaddr())] <<< 1;
}

probe end {
    foreach([sym+] in prof) {
        printf("%16s %d\n", sym, @count(prof[sym]));
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM