[英]perf stats on AMD 15h
根據 AMD 15h 的BKDG (第 588 頁),可以通過設置 MSRC001_1022 的某些位來禁用硬件預取器
MSRC001_1022 Data Cache Configuration (DC_CFG)
Bits --> Description
63:16 --> Reserved.
15 --> DisPfHwForSw. Read-write. Reset: 0. 1=Disable hardware prefetches for software prefetches.
14 --> Reserved.
13 --> DisHwPf. Read-write. Reset: 0. 1=Disable the DC hardware prefetcher.
12:10 --> Reserved.
9:5 --> Reserved.
4 --> DisSpecTlbRld. Read-write. Reset: 0. 1=Disable speculative TLB reloads.
3:0 --> Reserved.
為了禁用所有預取配置,我必須將 0xA008 寫入該 MSR。 我對所有 32 個內核都使用了
[root <at> tiger exe]# wrmsr -a 0xc0011022 0xA008
[root <at> tiger exe]# rdmsr -a -x -0 0xc0011022
000000000000a008
...
但是,當我與命令一起運行 perf 時,預取統計數據不為零!
[root <at> tiger exe]# perf stat -e
L1-dcache-loads:uk,L1-dcache-prefetches:uk,L1-dcache-prefetch-misses:uk ./bzip2_base.amd64-m64-gcc44-nn
spec_init
Tested 64MB buffer: OK!
Performance counter stats for './bzip2_base.amd64-m64-gcc44-nn':
55,341,597,193 L1-dcache-loads:uk
1,047,662,614 L1-dcache-prefetches:uk
0 L1-dcache-prefetch-misses:uk
35.921618464 seconds time elapsed
我希望在 L1-dcache-prefetches 前面看到 0。 不是嗎?
我如何調試計數器以找出它們如何映射到 MSR?
硬件計數器的合成性能名稱的映射(由perf list
)在許多 CPU 的perf_events
子系統的內核源中定義。 對於 amd,它們位於arch/x86/events/amd/core.c
文件中。 在 4.8 版本的內核和 amd cpu 緩存事件映射到特定於 cpu 的常量以寫入 PMC MSR,如下所示:
http://elixir.free-electrons.com/linux/v4.8/source/arch/x86/events/amd/core.c
static __initconst const u64 amd_hw_cache_event_ids
... = {
[ C(L1D) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0x0040, /* Data Cache Accesses */
[ C(RESULT_MISS) ] = 0x0141, /* Data Cache Misses */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = 0,
[ C(RESULT_MISS) ] = 0,
},
[ C(OP_PREFETCH) ] = {
[ C(RESULT_ACCESS) ] = 0x0267, /* Data Prefetcher :attempts */
[ C(RESULT_MISS) ] = 0x0167, /* Data Prefetcher :cancelled */
},
},
[ C(L1I ) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0x0080, /* Instruction cache fetches */
[ C(RESULT_MISS) ] = 0x0081, /* Instruction cache misses */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = -1,
[ C(RESULT_MISS) ] = -1,
},
[ C(OP_PREFETCH) ] = {
[ C(RESULT_ACCESS) ] = 0x014B, /* Prefetch Instructions :Load */
[ C(RESULT_MISS) ] = 0,
},
},
[ C(LL ) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0x037D, /* Requests to L2 Cache :IC+DC */
[ C(RESULT_MISS) ] = 0x037E, /* L2 Cache Misses : IC+DC */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = 0x017F, /* L2 Fill/Writeback */
[ C(RESULT_MISS) ] = 0,
},
[ C(OP_PREFETCH) ] = {
[ C(RESULT_ACCESS) ] = 0,
[ C(RESULT_MISS) ] = 0,
},
},
...
__init int amd_pmu_init(void)
{ ...
/* Performance-monitoring supported from K7 and later: */
if (boot_cpu_data.x86 < 6)
return -ENODEV;
x86_pmu = amd_pmu;
ret = amd_core_pmu_init();
...
/* Events are common for all AMDs */
memcpy(hw_cache_event_ids, amd_hw_cache_event_ids,
sizeof(hw_cache_event_ids));
return 0;
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.