简体   繁体   English

PMU x86-64 性能计数器未显示在 AWS 下的性能中

[英]PMU x86-64 performance counters not showing in perf under AWS

I am running a C++ benchmark test for a specific application.我正在为特定应用程序运行 C++ 基准测试。 In this test, I open the performance counter file (__NR_perf_event_open syscall) before the critical section, proceed with the section and then after read the specified metric (instructions, cycles, branches, cachemisses, etc).在此测试中,我在关键部分之前打开性能计数器文件 (__NR_perf_event_open 系统调用),继续该部分,然后在读取指定的指标(指令、周期、分支、缓存丢失等)之后。

I verified that this needs to run under sudo because the process needs CAP_PERFCOUNT capabilities.我确认这需要在 sudo 下运行,因为该进程需要 CAP_PERFCOUNT 功能。 I also have to verify that /proc/sys/kernel/perf_event_paranoid is set to a number higher than 2, which seems to be always the case with Ubuntu 20.04.3 with kernel 5.11.0 which is the OS I standardized across tests.我还必须验证/proc/sys/kernel/perf_event_paranoid设置为大于 2 的数字,这似乎总是 Ubuntu 20.04.3 和 kernel 5.11.0 的情况,这是我在测试中标准化的操作系统。

This setup works on all my local machines.此设置适用于我所有的本地机器。 On the cloud, however, it works only on some instances as m5zn.6xlarge (Intel Xeon Platinum 8252C).然而,在云端,它仅适用于某些实例,例如 m5zn.6xlarge(英特尔至强铂金 8252C)。 It does not work on others as t3.medium, c3.4xlarge, c5a.8xlarge.它不适用于其他人,如 t3.medium、c3.4xlarge、c5a.8xlarge。

The AMI on all them are the same ami-09e67e426f25ce0d7.它们上的 AMI 都是相同的 ami-09e67e426f25ce0d7。

One easy way to verify this behavior is run the following command:验证此行为的一种简单方法是运行以下命令:

sudo perf stat /bin/sleep 1

On the m5zn box I will see:在 m5zn 框上我会看到:

 Performance counter stats for '/bin/sleep 1':

          0.54 msec task-clock                #    0.001 CPUs utiliz
             1      context-switches          #    0.002 M/sec
             1      cpu-migrations            #    0.002 M/sec
            75      page-faults               #    0.139 M/sec
       2191485      cycles                    #    4.070 GHz
       1292564      instructions              #    0.59  insn per cyc
        258373      branches                  #  479.860 M/sec
         11090      branch-misses             #    4.29% of all branc

   1.000902741 seconds time elapsed

   0.000889000 seconds user
   0.000000000 seconds sys

Perf with valid output Perf 有效 output

While on the other boxes I will see:在其他盒子上我会看到:

 Performance counter stats for '/bin/sleep 1':

          0.62 msec task-clock                #    0.001 CPUs utilized
             2      context-switches          #    0.003 M/sec
             0      cpu-migrations            #    0.000 K/sec
            76      page-faults               #    0.124 M/sec
<not supported>      cycles
<not supported>      instructions
<not supported>      branches
<not supported>      branch-misses

   1.002488031 seconds time elapsed

   0.000930000 seconds user
   0.000000000 seconds sys

Perf with not supported values使用不支持的值执行

My suspicion is that the m5zn.6xlarge is backed by a real instance while the others are shared instances.我怀疑 m5zn.6xlarge 由真实实例支持,而其他实例是共享实例。 is my suspicion correct?我的怀疑是否正确?

What instances I can launch that will provide me with performance counter PMU support?我可以启动哪些实例来为我提供性能计数器 PMU 支持?

Thank you!谢谢!

After some research I found out that because all Amazon AWS instances are virtual instances, none of the guest operating systems can directly access the hardware performance counters (PMC or PMU).经过一些研究,我发现由于所有 Amazon AWS 实例都是虚拟实例,因此来宾操作系统都不能直接访问硬件性能计数器(PMC 或 PMU)。

The guest OS can only read the performance counters through a kernel driver called virtual PMU (vPMU), which is available only for certain Intel Xeon CPUs.来宾操作系统只能通过称为虚拟 PMU (vPMU) 的 kernel 驱动程序读取性能计数器,该驱动程序仅适用于某些英特尔至强 CPU。

Therefore in my attempted list of instances, only the m5zn with an Intel Platinum 8252 has a supported CPU.因此,在我尝试的实例列表中,只有带有 Intel Platinum 8252 的 m5zn 具有受支持的 CPU。

It is easy to check if the guest OS supports vPMU by running通过运行可以很容易地检查来宾操作系统是否支持 vPMU

cat /proc/cpuinfo | grep arch_perfmon

It is also possible to check in the dmesg output right after smpboot:也可以在 smpboot 之后立即检查 dmesg output:

[    0.916264] smpboot: CPU0: Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz (family: 0x6, model: 0x55, stepping: 0x4)
[    0.916410] Performance Events: unsupported p6 CPU model 85 no PMU driver, software events only.

On AWS the rule of thumb is that you will get vPMU only on the largest instances, or instances that take an entire socket.在 AWS 上,经验法则是您只能在最大的实例或占用整个套接字的实例上获得 vPMU。

https://oavdeev.github.io/posts/vpmu_support_z1d/ https://oavdeev.github.io/posts/vpmu_support_z1d/

Currently these instances support vPMU:目前这些实例支持 vPMU:

i3.metal
c5.9xlarge
c5.18xlarge
m4.16xlarge
m5.12xlarge
m5.24xlarge
r5.12xlarge
r5.24xlarge
f1.16xlarge
h1.16xlarge
i3.16xlarge
p2.16xlarge
p3.16xlarge
r4.16xlarge
x1.32xlarge
c5d.9xlarge
c5d.18xlarge
m5d.12xlarge
m5d.24xlarge
r5d.12xlarge
r5d.24xlarge
x1e.32xlarge

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Java volatile memory 排序及其在 x86-64 上的编译 - Java volatile memory ordering and its compilation on x86-64 如何在 x86-64 架构的 aarch64 中安装 conda 包 - How to install conda packages in the aarch64 from x86-64 architecture i386 输入文件的架构与 i386:x86-64 不兼容 - Architecture of i386 input file is incompatible with i386:x86-64 当 CPL=3 时发生硬件中断时,仅设置写入位的 x86-64 页面错误的原因是什么 - What causes x86-64 Page Fault with only the Write bit set when a hardware interrupt happens while CPL=3 AWS Lambda 发布层版本,获取:未知选项:--compatible-architectures, x86_64 - AWS Lambda publish-layer-version, getting: Unknown options: --compatible-architectures, x86_64 如何在 m1 上安装 x86/64 架构 pod - How to install x86/64 architecture pod on m1 在 Xcode 13 中找不到架构 x86_64 的符号 - Symbol(s) not found for architecture x86_64 in Xcode 13 x86_64 汇编语言中标签和宏的区别? - Differences between labels and macros in x86_64 assembly language? 架构 x86_64 的 Firebase 未定义符号 - Firebase Undefined symbols for architecture x86_64 如何在保持默认输出/事件的同时将特定事件计数器添加到 Perf? - How to add specific event counters to Perf whilst keeping the default output/events?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM