简体   繁体   English

使用linux perf工具测量应用程序的FLOP

[英]Measuring FLOPs of an application with the linux perf tool

I want to measure the ammount of floating point and arithmetic operations executed by some application with 'perf', the new command line interface command to the linux performance counter subsystem. 我想测量某些应用程序使用'perf'执行的浮点数和算术运算,这是对linux性能计数器子系统的新命令行接口命令。 (For testing purposes I use a simple dummy app which I created, see below). (出于测试目的,我使用了一个我创建的简单虚拟应用程序,见下文)。

Because I could not find any 'perf' events defined for measuring FP and integer operations, I started digging in the raw hardware event codes (to be used with -rNNN, where NNN is hexadecimal value of the event code). 因为我找不到为测量FP和整数运算定义的任何'perf'事件,所以我开始挖掘原始硬件事件代码(与-rNNN一起使用,其中NNN是事件代码的十六进制值)。 So my real problem is that, the codes I found for retired instructions (INST_RETIRED) do not make the distinction between FP and other instructions (X87 and MMX/SSE). 所以我真正的问题是,我找到的退役指令代码(INST_RETIRED)没有区分FP和其他指令(X87和MMX / SSE)。 When I tried to use the appropriate umasks to the particular code I found out that somehow 'perf' does not understand or support the umask inclusion. 当我尝试使用适当的umasks到特定的代码时,我发现某种'perf'不理解或支持umask包含。 i tried with: 我尝试过:

% perf stat -e rC0 ./a.out

which gives me the instructions retired, but 这给了我退休的指示,但是

% perf stat -e rC002 ./a.out 

which should give me the X87 instructions executed says I supplied wrong parameters. 应该给我执行的X87指令说我提供了错误的参数。 Maybe so, but what is the correct way to use umasks of raw hardware events with 'perf'? 也许是这样,但是使用'perf'原始硬件事件的umasks的正确方法是什么? in general what is the way to get the exact number of floating point and integer operations a program executed using the perf tool? 一般来说,使用perf工具执行程序执行的浮点和整数运算的确切数量是什么?

Many thanks, Konstantin Boyanov 非常感谢Konstantin Boyanov


Here is my test app: 这是我的测试应用程序:

int main(void){
  float  numbers[1000];
  float res1;
  double doubles[1000];
  double res2;

  int i,j=3,k=42;

  for(i=0;i<1000;i++){
    numbers[i] = (i+k)*j;
    doubles[i] = (i+j)*k;
    res1 = numbers[i]/(float)k;
    res2 = doubles[i]/(float)j;
  }
}

The event to use depends on the processor. 要使用的事件取决于处理器。 You can use libpfm4 (http://perfmon2.git.sourceforge.net/git/gitweb-index.cgi) to determine which are the available events (using the showevinfo program) and then check_events from the same distribution to figure out the raw codes for the event. 您可以使用libpfm4(http://perfmon2.git.sourceforge.net/git/gitweb-index.cgi)来确定哪些是可用事件(使用showevinfo程序),然后检查来自同一发行版的check_events以找出原始事件事件的代码。 My Sandy Bridge CPU supports the FP_COMP_OPS_EXE event which I have empirically found corresponds closely to the FLOP count. 我的Sandy Bridge CPU支持FP_COMP_OPS_EXE事件,我根据经验发现该事件与FLOP计数密切相关。

I'm not sure about perf, but oprofile has floating point events for many processors. 我不确定perf,但oprofile有许多处理器的浮点事件。 There may be some overlap, as INST_RETIRED is a valid oprofile event too. 可能存在一些重叠,因为INST_RETIRED也是有效的oprofile事件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM