[英]Hardware cache events and perf
When I run perf list
I see a bunch of Hardware Cache Events , as follows: 当我运行perf list
我看到一堆硬件缓存事件 ,如下所示:
$ perf list | grep 'cache event'
L1-dcache-load-misses [Hardware cache event]
L1-dcache-loads [Hardware cache event]
L1-dcache-stores [Hardware cache event]
L1-icache-load-misses [Hardware cache event]
LLC-load-misses [Hardware cache event]
LLC-loads [Hardware cache event]
LLC-store-misses [Hardware cache event]
LLC-stores [Hardware cache event]
branch-load-misses [Hardware cache event]
branch-loads [Hardware cache event]
dTLB-load-misses [Hardware cache event]
dTLB-loads [Hardware cache event]
dTLB-store-misses [Hardware cache event]
dTLB-stores [Hardware cache event]
iTLB-load-misses [Hardware cache event]
iTLB-loads [Hardware cache event]
node-load-misses [Hardware cache event]
node-loads [Hardware cache event]
node-store-misses [Hardware cache event]
node-stores [Hardware cache event]
These events mostly seem to return reasonable values based on tests, but I would like to know how to determine to map these events to hardware performance counter events on my system? 这些事件似乎主要基于测试返回合理的值,但我想知道如何确定将这些事件映射到我的系统上的硬件性能计数器事件?
That is, these events are certainly implemented using one or more underlying x86 PMU counters on my Skylake CPU - but how do I know which ones? 也就是说,这些事件肯定是使用我的Skylake CPU上的一个或多个底层x86 PMU计数器实现的 - 但我怎么知道哪些?
You can look in /sys/devices/cpu/events
for other hardware events, but not for "Hardware cache events". 您可以在/sys/devices/cpu/events
查找其他硬件事件,但不能查看“硬件缓存事件”。
User @Margaret points towards a reasonable answer in the comments - read the kernel source to see the mapping for the PMU events. 用户@Margaret 在评论中指出了一个合理的答案 - 阅读内核源代码以查看PMU事件的映射。
We can check arch/x86/events/intel/core.c for the event definitions. 我们可以检查arch / x86 / events / intel / core.c以获取事件定义。 I don't actually know if "core" here refers to the Core architecture, of just that this is the core fine with most definitions - but in any case it's the file you want to look at. 我实际上并不知道这里的“核心”是指核心架构,只是这是大多数定义的核心 - 但无论如何它都是你想看的文件。
The key part is this section , which defines skl_hw_cache_event_ids
: 关键部分是这一部分 ,它定义了skl_hw_cache_event_ids
:
static __initconst const u64 skl_hw_cache_event_ids
[PERF_COUNT_HW_CACHE_MAX]
[PERF_COUNT_HW_CACHE_OP_MAX]
[PERF_COUNT_HW_CACHE_RESULT_MAX] =
{
[ C(L1D ) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0x81d0, /* MEM_INST_RETIRED.ALL_LOADS */
[ C(RESULT_MISS) ] = 0x151, /* L1D.REPLACEMENT */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = 0x82d0, /* MEM_INST_RETIRED.ALL_STORES */
[ C(RESULT_MISS) ] = 0x0,
},
[ C(OP_PREFETCH) ] = {
[ C(RESULT_ACCESS) ] = 0x0,
[ C(RESULT_MISS) ] = 0x0,
},
},
...
Decoding the nested initializers, you get that the L1D-dcahe-load
corresponds to MEM_INST_RETIRED.ALL_LOAD
and L1-dcache-load-misses
to L1D.REPLACEMENT
. 解码嵌套的初始值设定项,可以得出L1D-dcahe-load
对应于MEM_INST_RETIRED.ALL_LOAD
和L1-dcache-load-misses
MEM_INST_RETIRED.ALL_LOAD
对应于L1D.REPLACEMENT
。
We can double check this with perf: 我们可以用perf仔细检查一下:
$ ocperf stat -e mem_inst_retired.all_loads,L1-dcache-loads,l1d.replacement,L1-dcache-load-misses,L1-dcache-loads,mem_load_retired.l1_hit head -c100M /dev/zero > /dev/null
Performance counter stats for 'head -c100M /dev/zero':
11,587,793 mem_inst_retired_all_loads
11,587,793 L1-dcache-loads
20,233 l1d_replacement
20,233 L1-dcache-load-misses # 0.17% of all L1-dcache hits
11,587,793 L1-dcache-loads
11,495,053 mem_load_retired_l1_hit
0.024322360 seconds time elapsed
The "Hardware Cache" events show exactly the same values as using the underlying PMU events we guessed at by checking the source. “硬件缓存”事件显示与使用我们通过检查源我们猜到的基础PMU事件完全相同的值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.