简体   繁体   中英

Why won't perf report “dcache-store-misses”?

I am using perf to collect some metrics about my code, and I am running the following command:

sudo perf stat -e L1-dcache-load-misses,L1-dcache-store-misses ./progB

L1-dcache-load misses works well, but L1-dcache-store-misses always returns this:

<not supported>      L1-dcache-store-misses   

What could I be doing wrong?

Perf prints <not supported> for generic events which were requested by user or by default event set (in perf stat ) which are not mapped to real hardware PMU events on current hardware. Your hardware have no exact match to L1-dcache-store-misses generic event so perf informs you that your request sudo perf stat -e L1-dcache-load-misses,L1-dcache-store-misses./progB can't be fully implemented on current machine.

Your cpu is "Product formerly Kaby Lake" which has skylake PMU according to linux kernel file arch/x86/events/intel/core.c :

#L4986
case INTEL_FAM6_KABYLAKE:
    memcpy(hw_cache_event_ids, skl_hw_cache_event_ids, sizeof(hw_cache_event_ids));

Line 420 of this file is the cache event mapping (generic perf event name to real hw pmu event code) for skylake pmu - skl_hw_cache_event_ids , and your l1d load/store miss are [ C(L1D ) ] - [ C(OP_READ) ] / [ C(OP_WRITE) ] - [ C(RESULT_MISS) ] fields of this strange data structure ( = 0 means not mapped, and skl_hw_cache_extra_regs L525 has additional umask settings for events):

static ... const... skl_hw_cache_event_ids ... =
{
 [ C(L1D ) ] = {
    [ C(OP_READ) ] = {
        [ C(RESULT_ACCESS) ] = 0x81d0,  /* MEM_INST_RETIRED.ALL_LOADS */
        [ C(RESULT_MISS)   ] = 0x151,   /* L1D.REPLACEMENT */
    },
    [ C(OP_WRITE) ] = {
        [ C(RESULT_ACCESS) ] = 0x82d0,  /* MEM_INST_RETIRED.ALL_STORES */
        [ C(RESULT_MISS)   ] = 0x0,
    }, ...
 },

So, for SkyLake L1d misses are defined for loads (op_read) as and not defined for stores (op_write). And L1d accesses are defined for both operations.

These generic events were probably created long time ago, when hardware had some PMU event to implement them. For example, Core 2 PMU has mapping for these events, arch/x86/events/intel/core.c line 1254 core2_hw_cache_event_ids const - l1d read miss is L1D_CACHE_LD.I_STATE, l1d write miss is L1D_CACHE_ST.I_STATE. perf subsystem in kernel just had to keep many generic event names, added in old versions, to have compatibility.

You should check output of sudo perf list cache command to select supported events for your CPU and its PMU. This command (in recent perf tool versions) will output only mapped generic names and will also print hardware-specific event names. You also should check Intel SDM , optimization and perfcounters manuals to get understanding about how the load and stores are implemented and which PMU events you should use to count hardware events.

While L1d store miss are not available on your cpu, you should think about what is the store miss and how it is implemented. Probably, this request will be passed to some next level of cache/memory hierarchy, for example it will become L2 store access. perf generic event set is ugly (was introduced in the era of 2 level cache in Core2) and has only L1 and LLC (last level cache) cache events. Not sure how LLC is mapped in the current era of shared L3, is it L2 or L3 ( skylake's llc = L3 ). But intel-specific events should work.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM