简体   繁体   English

为什么 perf 不报告“dcache-store-misses”?

[英]Why won't perf report “dcache-store-misses”?

I am using perf to collect some metrics about my code, and I am running the following command:我正在使用 perf 收集有关我的代码的一些指标,并且正在运行以下命令:

sudo perf stat -e L1-dcache-load-misses,L1-dcache-store-misses ./progB

L1-dcache-load misses works well, but L1-dcache-store-misses always returns this: L1-dcache-load 未命中运行良好,但 L1-dcache-store-misses 总是返回:

<not supported>      L1-dcache-store-misses   

What could I be doing wrong?我可能做错了什么?

Perf prints <not supported> for generic events which were requested by user or by default event set (in perf stat ) which are not mapped to real hardware PMU events on current hardware. Perf 为用户请求的通用事件或默认事件集(在perf stat中)打印<not supported> ,这些事件未映射到当前硬件上的真实硬件 PMU 事件。 Your hardware have no exact match to L1-dcache-store-misses generic event so perf informs you that your request sudo perf stat -e L1-dcache-load-misses,L1-dcache-store-misses./progB can't be fully implemented on current machine.您的硬件与L1-dcache-store-misses通用事件不完全匹配,因此 perf 会通知您您的请求sudo perf stat -e L1-dcache-load-misses,L1-dcache-store-misses./progB不能在当前机器上完全实现。

Your cpu is "Product formerly Kaby Lake" which has skylake PMU according to linux kernel file arch/x86/events/intel/core.c :根据linux kernel 文件arch/x86/events/intel/core.c ,您的 cpu 是“以前的 Kaby Lake 产品”

#L4986
case INTEL_FAM6_KABYLAKE:
    memcpy(hw_cache_event_ids, skl_hw_cache_event_ids, sizeof(hw_cache_event_ids));

Line 420 of this file is the cache event mapping (generic perf event name to real hw pmu event code) for skylake pmu - skl_hw_cache_event_ids , and your l1d load/store miss are [ C(L1D ) ] - [ C(OP_READ) ] / [ C(OP_WRITE) ] - [ C(RESULT_MISS) ] fields of this strange data structure ( = 0 means not mapped, and skl_hw_cache_extra_regs L525 has additional umask settings for events):此文件的第 420 行是 skylake pmu - skl_hw_cache_event_ids的缓存事件映射(通用 perf 事件名称到真实 hw pmu 事件代码),并且您的 l1d 加载/存储未命中是[ C(L1D ) ] - [ C(OP_READ) ] / [ C(OP_WRITE) ] - [ C(RESULT_MISS) ]这个奇怪数据结构的字段( = 0表示未映射,并且skl_hw_cache_extra_regs L525有额外的 umask 设置事件):

static ... const... skl_hw_cache_event_ids ... =
{
 [ C(L1D ) ] = {
    [ C(OP_READ) ] = {
        [ C(RESULT_ACCESS) ] = 0x81d0,  /* MEM_INST_RETIRED.ALL_LOADS */
        [ C(RESULT_MISS)   ] = 0x151,   /* L1D.REPLACEMENT */
    },
    [ C(OP_WRITE) ] = {
        [ C(RESULT_ACCESS) ] = 0x82d0,  /* MEM_INST_RETIRED.ALL_STORES */
        [ C(RESULT_MISS)   ] = 0x0,
    }, ...
 },

So, for SkyLake L1d misses are defined for loads (op_read) as and not defined for stores (op_write).因此,对于 SkyLake,L1d 未命中是为加载定义的(op_read),而不是为存储定义的(op_write)。 And L1d accesses are defined for both operations.并且为这两个操作定义了 L1d 访问。

These generic events were probably created long time ago, when hardware had some PMU event to implement them.这些通用事件可能是很久以前创建的,当时硬件有一些 PMU 事件来实现它们。 For example, Core 2 PMU has mapping for these events, arch/x86/events/intel/core.c line 1254 core2_hw_cache_event_ids const - l1d read miss is L1D_CACHE_LD.I_STATE, l1d write miss is L1D_CACHE_ST.I_STATE.例如,Core 2 PMU 有这些事件的映射, arch/x86/events/intel/core.c line 1254 core2_hw_cache_event_ids const - l1d read miss 是 L1D_CACHE_LD.I_STATE,l1d write miss 是 L1D_CACHE_ST.I_STATE。 perf subsystem in kernel just had to keep many generic event names, added in old versions, to have compatibility. kernel 中的 perf 子系统只需要保留许多在旧版本中添加的通用事件名称即可具有兼容性。

You should check output of sudo perf list cache command to select supported events for your CPU and its PMU.您应该检查sudo perf list cache命令的 output 到 select 支持的 CPU 及其 PMU 事件。 This command (in recent perf tool versions) will output only mapped generic names and will also print hardware-specific event names.此命令(在最近的 perf 工具版本中)将 output 仅映射通用名称,还将打印特定于硬件的事件名称。 You also should check Intel SDM , optimization and perfcounters manuals to get understanding about how the load and stores are implemented and which PMU events you should use to count hardware events.您还应该查看英特尔 SDM优化性能计数器手册,以了解加载和存储是如何实现的,以及您应该使用哪些 PMU 事件来计算硬件事件。

While L1d store miss are not available on your cpu, you should think about what is the store miss and how it is implemented.虽然 L1d 存储未命中在您的 cpu 上不可用,但您应该考虑什么是存储未命中以及它是如何实现的。 Probably, this request will be passed to some next level of cache/memory hierarchy, for example it will become L2 store access.很可能,这个请求将被传递到下一级缓存/内存层次结构,例如它将成为 L2 存储访问。 perf generic event set is ugly (was introduced in the era of 2 level cache in Core2) and has only L1 and LLC (last level cache) cache events. perf 通用事件集很丑陋(在 Core2 的 2 级缓存时代引入)并且只有 L1 和 LLC(最后一级缓存)缓存事件。 Not sure how LLC is mapped in the current era of shared L3, is it L2 or L3 ( skylake's llc = L3 ).不确定 LLC 在当前共享 L3 时代是如何映射的,是 L2 还是 L3( skylake 的 llc = L3 )。 But intel-specific events should work.但是特定于英特尔的事件应该可以工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM