簡體   English   中英

AMD 15h 的性能統計

[英]perf stats on AMD 15h

根據 AMD 15h 的BKDG (第 588 頁),可以通過設置 MSRC001_1022 的某些位來禁用硬件預取器

MSRC001_1022 Data Cache Configuration (DC_CFG)
Bits    -->  Description
63:16   -->  Reserved.
15      -->  DisPfHwForSw. Read-write. Reset: 0. 1=Disable hardware prefetches for software prefetches.
14      -->  Reserved.
13      -->  DisHwPf. Read-write. Reset: 0. 1=Disable the DC hardware prefetcher. 
12:10   -->  Reserved.
9:5     -->  Reserved.
4       -->  DisSpecTlbRld. Read-write. Reset: 0. 1=Disable speculative TLB reloads. 
3:0     -->  Reserved.

為了禁用所有預取配置,我必須將 0xA008 寫入該 MSR。 我對所有 32 個內核都使用了

[root <at> tiger exe]# wrmsr -a 0xc0011022 0xA008
[root <at> tiger exe]# rdmsr -a -x -0 0xc0011022
000000000000a008
...

但是,當我與命令一起運行 perf 時,預取統計數據不為零!

[root <at> tiger exe]# perf stat -e
L1-dcache-loads:uk,L1-dcache-prefetches:uk,L1-dcache-prefetch-misses:uk ./bzip2_base.amd64-m64-gcc44-nn
spec_init
Tested 64MB buffer: OK!
 Performance counter stats for './bzip2_base.amd64-m64-gcc44-nn':
    55,341,597,193 L1-dcache-loads:uk
     1,047,662,614 L1-dcache-prefetches:uk
                 0 L1-dcache-prefetch-misses:uk
      35.921618464 seconds time elapsed

我希望在 L1-dcache-prefetches 前面看到 0。 不是嗎?

我如何調試計數器以找出它們如何映射到 MSR?

硬件計數器的合成性能名稱的映射(由perf list )在許多 CPU 的perf_events子系統的內核源中定義。 對於 amd,它們位於arch/x86/events/amd/core.c文件中。 在 4.8 版本的內核和 amd cpu 緩存事件映射到特定於 cpu 的常量以寫入 PMC MSR,如下所示:

http://elixir.free-electrons.com/linux/v4.8/source/arch/x86/events/amd/core.c

static __initconst const u64 amd_hw_cache_event_ids
 ... =  {
 [ C(L1D) ] = {
    [ C(OP_READ) ] = {
        [ C(RESULT_ACCESS) ] = 0x0040, /* Data Cache Accesses        */
        [ C(RESULT_MISS)   ] = 0x0141, /* Data Cache Misses          */
    },
    [ C(OP_WRITE) ] = {
        [ C(RESULT_ACCESS) ] = 0,
        [ C(RESULT_MISS)   ] = 0,
    },
    [ C(OP_PREFETCH) ] = {
        [ C(RESULT_ACCESS) ] = 0x0267, /* Data Prefetcher :attempts  */
        [ C(RESULT_MISS)   ] = 0x0167, /* Data Prefetcher :cancelled */
    },
 },
 [ C(L1I ) ] = {
    [ C(OP_READ) ] = {
        [ C(RESULT_ACCESS) ] = 0x0080, /* Instruction cache fetches  */
        [ C(RESULT_MISS)   ] = 0x0081, /* Instruction cache misses   */
    },
    [ C(OP_WRITE) ] = {
        [ C(RESULT_ACCESS) ] = -1,
        [ C(RESULT_MISS)   ] = -1,
    },
    [ C(OP_PREFETCH) ] = {
        [ C(RESULT_ACCESS) ] = 0x014B, /* Prefetch Instructions :Load */
        [ C(RESULT_MISS)   ] = 0,
    },
 },
 [ C(LL  ) ] = {
    [ C(OP_READ) ] = {
        [ C(RESULT_ACCESS) ] = 0x037D, /* Requests to L2 Cache :IC+DC */
        [ C(RESULT_MISS)   ] = 0x037E, /* L2 Cache Misses : IC+DC     */
    },
    [ C(OP_WRITE) ] = {
        [ C(RESULT_ACCESS) ] = 0x017F, /* L2 Fill/Writeback           */
        [ C(RESULT_MISS)   ] = 0,
    },
    [ C(OP_PREFETCH) ] = {
        [ C(RESULT_ACCESS) ] = 0,
        [ C(RESULT_MISS)   ] = 0,
    },
 },

...
__init int amd_pmu_init(void)
{ ...
    /* Performance-monitoring supported from K7 and later: */
    if (boot_cpu_data.x86 < 6)
        return -ENODEV;

    x86_pmu = amd_pmu;

    ret = amd_core_pmu_init();
    ...

    /* Events are common for all AMDs */
    memcpy(hw_cache_event_ids, amd_hw_cache_event_ids,
           sizeof(hw_cache_event_ids));
    return 0;
}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM