简体   繁体   中英

perf stats on AMD 15h

According tot he BKDG of AMD 15h (page 588), it is possbile to disable the hardware prefetcher by setting some bits of MSRC001_1022

MSRC001_1022 Data Cache Configuration (DC_CFG)
Bits    -->  Description
63:16   -->  Reserved.
15      -->  DisPfHwForSw. Read-write. Reset: 0. 1=Disable hardware prefetches for software prefetches.
14      -->  Reserved.
13      -->  DisHwPf. Read-write. Reset: 0. 1=Disable the DC hardware prefetcher. 
12:10   -->  Reserved.
9:5     -->  Reserved.
4       -->  DisSpecTlbRld. Read-write. Reset: 0. 1=Disable speculative TLB reloads. 
3:0     -->  Reserved.

In order to disable all prefetch configs, I have to write 0xA008 to that MSR. I did that for all 32 cores using

[root <at> tiger exe]# wrmsr -a 0xc0011022 0xA008
[root <at> tiger exe]# rdmsr -a -x -0 0xc0011022
000000000000a008
...

However, when I run perf along with the command, the prefetch stats are non-zero!

[root <at> tiger exe]# perf stat -e
L1-dcache-loads:uk,L1-dcache-prefetches:uk,L1-dcache-prefetch-misses:uk ./bzip2_base.amd64-m64-gcc44-nn
spec_init
Tested 64MB buffer: OK!
 Performance counter stats for './bzip2_base.amd64-m64-gcc44-nn':
    55,341,597,193 L1-dcache-loads:uk
     1,047,662,614 L1-dcache-prefetches:uk
                 0 L1-dcache-prefetch-misses:uk
      35.921618464 seconds time elapsed

I expect to see 0 in front of L1-dcache-prefetches. Don't you?

How can I debug the counters in order to find out how they are mapped to the MSRs?

Mapping of synthetic perf names for hw counters (listed by perf list ) are defined inside kernel sources of perf_events subsystem for many CPUs. For amd they are in arch/x86/events/amd/core.c file. In 4.8 version of kernel and amd cpu cache events are mapped to cpu-specific constants to be written into PMC MSRs as:

http://elixir.free-electrons.com/linux/v4.8/source/arch/x86/events/amd/core.c

static __initconst const u64 amd_hw_cache_event_ids
 ... =  {
 [ C(L1D) ] = {
    [ C(OP_READ) ] = {
        [ C(RESULT_ACCESS) ] = 0x0040, /* Data Cache Accesses        */
        [ C(RESULT_MISS)   ] = 0x0141, /* Data Cache Misses          */
    },
    [ C(OP_WRITE) ] = {
        [ C(RESULT_ACCESS) ] = 0,
        [ C(RESULT_MISS)   ] = 0,
    },
    [ C(OP_PREFETCH) ] = {
        [ C(RESULT_ACCESS) ] = 0x0267, /* Data Prefetcher :attempts  */
        [ C(RESULT_MISS)   ] = 0x0167, /* Data Prefetcher :cancelled */
    },
 },
 [ C(L1I ) ] = {
    [ C(OP_READ) ] = {
        [ C(RESULT_ACCESS) ] = 0x0080, /* Instruction cache fetches  */
        [ C(RESULT_MISS)   ] = 0x0081, /* Instruction cache misses   */
    },
    [ C(OP_WRITE) ] = {
        [ C(RESULT_ACCESS) ] = -1,
        [ C(RESULT_MISS)   ] = -1,
    },
    [ C(OP_PREFETCH) ] = {
        [ C(RESULT_ACCESS) ] = 0x014B, /* Prefetch Instructions :Load */
        [ C(RESULT_MISS)   ] = 0,
    },
 },
 [ C(LL  ) ] = {
    [ C(OP_READ) ] = {
        [ C(RESULT_ACCESS) ] = 0x037D, /* Requests to L2 Cache :IC+DC */
        [ C(RESULT_MISS)   ] = 0x037E, /* L2 Cache Misses : IC+DC     */
    },
    [ C(OP_WRITE) ] = {
        [ C(RESULT_ACCESS) ] = 0x017F, /* L2 Fill/Writeback           */
        [ C(RESULT_MISS)   ] = 0,
    },
    [ C(OP_PREFETCH) ] = {
        [ C(RESULT_ACCESS) ] = 0,
        [ C(RESULT_MISS)   ] = 0,
    },
 },

...
__init int amd_pmu_init(void)
{ ...
    /* Performance-monitoring supported from K7 and later: */
    if (boot_cpu_data.x86 < 6)
        return -ENODEV;

    x86_pmu = amd_pmu;

    ret = amd_core_pmu_init();
    ...

    /* Events are common for all AMDs */
    memcpy(hw_cache_event_ids, amd_hw_cache_event_ids,
           sizeof(hw_cache_event_ids));
    return 0;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM