在 Linux 中，单独的用户空间/内核空间中的 Perf 指令/周期计数

Question

I'm trying to profile an application which has both userspace and kernelspace code using perf.我正在尝试使用 perf 分析具有用户空间和内核空间代码的应用程序。 I tried every other possibility enabling various kernel configurations but I'm unable to get the instructions/cycles count which are in userspace/kernelspace alone.我尝试了启用各种内核配置的所有其他可能性，但我无法单独获得用户空间/内核空间中的指令/周期数。 I tried using the ":u" and ":k extensions to instructions and cycles count, but all I get as reply is我尝试使用 ":u" 和 ":k 扩展指令和周期计数，但我得到的答复是

$ perf stat -e cycles:u,instructions:u ls

 Performance counter stats for 'ls':

   <not supported>      cycles:u

   <not supported>      instructions:u

       0.006047045 seconds time elapsed

       0.000000000 seconds user
       0.008098000 seconds sys

However, running just for cycles/instructions gives a proper result something like below.但是，仅针对循环/指令运行会给出如下所示的正确结果。

$ perf stat -e cycles,instructions ls

 Performance counter stats for 'ls':

          5362086      cycles
            528783      instructions              #    0.10  insn per cycle

       0.005487940 seconds time elapsed

       0.007800000 seconds user
       0.000000000 seconds sys

Note: ls is just used as an example here to highlight the issue.注意：这里 ls 只是作为一个例子来突出问题。

I'm running Linux 5.4 and perf version 5.4.77.g1206eede9156.我正在运行 Linux 5.4 和 perf 版本 5.4.77.g1206eede9156。 And, I'm running the above command on ARM board.而且，我在 ARM 板上运行上述命令。 Below are the configurations that I've enabled in the Linux kernel下面是我在 Linux 内核中启用的配置

CONFIG_PERF_EVENTS=y
CONFIG_PROFILING=y
CONFIG_TRACEPOINTS=y
CONFIG_KPROBES=y
CONFIG_OPTPROBES=y
CONFIG_KPROBES_ON_FTRACE=y
CONFIG_UPROBES=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_OPTPROBES=y
CONFIG_HAVE_KPROBES_ON_FTRACE=y
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_INFO_DWARF4=y
CONFIG_FRAME_POINTER=y
CONFIG_FTRACE=y
CONFIG_KPROBE_EVENTS=y
CONFIG_UPROBE_EVENTS=y
CONFIG_PROBE_EVENTS=y

Further, perf list on the command line lists hardware/software events and many more此外，命令行上的 perf list 列出了硬件/软件事件等等

$ perf list
  branch-instructions OR branches                    [Hardware event]
  branch-misses                                      [Hardware event]
  cache-misses                                       [Hardware event]
  cache-references                                   [Hardware event]
  cpu-cycles OR cycles                               [Hardware event]
  instructions                                       [Hardware event]
  alignment-faults                                   [Software event]
  bpf-output                                         [Software event]
  context-switches OR cs                             [Software event]
  cpu-clock                                          [Software event]
  cpu-migrations OR migrations                       [Software event]
  dummy                                              [Software event]
  emulation-faults                                   [Software event]
  major-faults                                       [Software event]
  minor-faults                                       [Software event]
  page-faults OR faults                              [Software event]
  task-clock                                         [Software event]
  duration_time                                      [Tool event]
  L1-dcache-load-misses                              [Hardware cache event]
  L1-dcache-loads                                    [Hardware cache event]
  L1-dcache-prefetch-misses                          [Hardware cache event]
  L1-dcache-prefetches                               [Hardware cache event]
  L1-dcache-store-misses                             [Hardware cache event]
  L1-dcache-stores                                   [Hardware cache event]
  L1-icache-load-misses                              [Hardware cache event]
  L1-icache-loads                                    [Hardware cache event]
  L1-icache-prefetch-misses                          [Hardware cache event]
  L1-icache-prefetches                               [Hardware cache event]
  branch-load-misses                                 [Hardware cache event]
  branch-loads                                       [Hardware cache event]
  dTLB-load-misses                                   [Hardware cache event]
  dTLB-store-misses                                  [Hardware cache event]
  iTLB-load-misses                                   [Hardware cache event]

Kindly suggest how to fix this issue.请建议如何解决此问题。 Am I doing anything wrong?我做错了什么吗？

Answer 1

Works for me, 444,022 cycles:u for perf stat -e cycles:u ls .对我444,022 cycles:u ， 444,022 cycles:u用于perf stat -e cycles:u ls 。 perf version 5.13.g62fb9874f5da, on Linux 5.12.15-arch1-1, on bare metal (x86-64 Skylake), with perf_event_paranoid=0 . perf 版本 5.13.g62fb9874f5da，在 Linux 5.12.15-arch1-1 上，在裸机 (x86-64 Skylake) 上， perf_event_paranoid=0 。
(With modern perf you can also use perf stat --all-user to imply :u for all events.) （使用现代 perf，您还可以使用perf stat --all-user暗示:u用于所有事件。）

I'm guessing your ARM CPU's hardware perf counters don't support being programmed with a mask for privilege-level, so perf reports that there is no hardware counter capable of counting only user-space instructions.我猜你的ARM CPU的硬件PERF专柜不支持与权限级别的面具被编程，所以perf报告没有能够只计算用户空间的指令的硬件计数器。

AFAIK, there aren't hooks at every interrupt entry point to enable / disable HW counters; AFAIK，在每个中断入口点都没有钩子来启用/禁用硬件计数器； counting only kernel, only user, or both, is purely a hardware feature.仅计算内核、仅用户或两者，纯粹是硬件功能。

HW support is obviously essential for accurate counts, because in a software implementation the counters would still be counting until kernel code ran that saved the current counts.硬件支持对于准确计数显然是必不可少的，因为在软件实现中，计数器仍然会计数，直到内核代码运行保存当前计数。 (And kernel code after restoring the state, before returning to user-space.) Also, it would make every interrupt and system call even more expensive, instead of only virtualizing perf counters by saving/restoring them every context switch between tasks/threads. （以及在恢复状态之后，返回用户空间之前的内核代码。）此外，它会使每个中断和系统调用更加昂贵，而不仅仅是通过在任务/线程之间的每个上下文切换中保存/恢复它们来虚拟化性能计数器。 So there are good reasons for the kernel not to support a loose attempt to do it in software even on CPUs that don't have HW support for a privilege mask.因此，即使在没有硬件支持特权掩码的 CPU 上，内核也有充分的理由不支持在软件中进行松散的尝试。

在 Linux 中，单独的用户空间/内核空间中的 Perf 指令/周期计数

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-10-15 09:24:26

在 Linux 中，单独的用户空间/内核空间中的 Perf 指令/周期计数

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-10-15 09:24:26

解决方案1
0 已采纳 2021-10-15 09:24:26