perf stat中的公式

Question

我想知道perf stat使用的公式來計算原始數據中的數字。

perf stat -e task-clock,cycles,instructions,cache-references,cache-misses ./myapp

    1080267.226401      task-clock (msec)         #   19.062 CPUs utilized          
 1,592,123,216,789      cycles                    #    1.474 GHz                      (50.00%)
   871,190,006,655      instructions              #    0.55  insn per cycle           (75.00%)
     3,697,548,810      cache-references          #    3.423 M/sec                    (75.00%)
       459,457,321      cache-misses              #   12.426 % of all cache refs      (75.00%)

在這種情況下，如何從緩存引用計算M / sec？

Answer 1

似乎沒有在builtin-stat.c實現builtin-stat.c （其中定義了perf stat默認事件集），但它們可能在perf_stat__print_shadow_stats()計算（並用stddev 平均 perf_stat__print_shadow_stats() （並且一些統計信息被收集到數組中） perf_stat__update_shadow_stats() ）：

http://elixir.free-electrons.com/linux/v4.13.4/source/tools/perf/util/stat-shadow.c#L626

當HW_INSTRUCTIONS計數時：“每個時鍾的指令”= HW_INSTRUCTIONS / HW_CPU_CYCLES; “每條指令停止的周期”= HW_STALLED_CYCLES_FRONTEND / HW_INSTRUCTIONS

if (perf_evsel__match(evsel, HARDWARE, HW_INSTRUCTIONS)) {
    total = avg_stats(&runtime_cycles_stats[ctx][cpu]);
    if (total) {
        ratio = avg / total;
        print_metric(ctxp, NULL, "%7.2f ",
                "insn per cycle", ratio);
    } else {
        print_metric(ctxp, NULL, NULL, "insn per cycle", 0);
    }

分支未命中來自print_branch_misses為HW_BRANCH_MISSES / HW_BRANCH_INSTRUCTIONS

perf_stat__print_shadow_stats()有幾個高速緩存未命中率計算，類似於HW_CACHE_MISSES / HW_CACHE_REFERENCES和一些更詳細的（ perf stat -d模式）。

停滯的百分比計算為HW_STALLED_CYCLES_FRONTEND / HW_CPU_CYCLES和HW_STALLED_CYCLES_BACKEND / HW_CPU_CYCLES

GHz計算為HW_CPU_CYCLES / runtime_nsecs_stats，其中runtime_nsecs_stats是從任何軟件事件task-clock或cpu-clock （SW_TASK_CLOCK和SW_CPU_CLOCK，我們仍然知道它們自2010年以來在LKML和2014年在SO中沒有確切的差異）

if (perf_evsel__match(counter, SOFTWARE, SW_TASK_CLOCK) ||
    perf_evsel__match(counter, SOFTWARE, SW_CPU_CLOCK))
    update_stats(&runtime_nsecs_stats[cpu], count[0]);

還有幾種交易公式（ perf stat -T模式）。

“CPU利用率”來自 task-clock或cpu-clock / walltime_nsecs_stats，其中walltime由perf stat本身計算（在使用空間時使用牆上的時鍾（天文時間）：

static inline unsigned long long rdclock(void)
{
    struct timespec ts;

    clock_gettime(CLOCK_MONOTONIC, &ts);
    return ts.tv_sec * 1000000000ULL + ts.tv_nsec;
}

...

static int __run_perf_stat(int argc, const char **argv)
{    
...
    /*
     * Enable counters and exec the command:
     */
    t0 = rdclock();
    clock_gettime(CLOCK_MONOTONIC, &ref_time);
    if (forks) {
        ....
    }
    t1 = rdclock();

    update_stats(&walltime_nsecs_stats, t1 - t0);

自上而下的方法也有一些估計（使用自上而下的微架構分析方法調整應用程序，軟件優化變得簡單，自上而下分析......名稱Skylake，IDF2015 ，＃22在Gregg的方法列表中。在2016年由Andi Kleen https://lwn.net/Articles/688335/ “將自上而下的指標添加到perf stat”（ perf stat --topdown -I 1000 cmd模式）。

最后，如果當前打印事件沒有確切的公式，則存在通用的“％c / sec”（K / sec或M / sec）度量： http ： //elixir.free-electrons.com/linux/v4 .13.4 / source / tools / perf / util / stat-shadow.c＃L845任何除以運行時間nsec（任務時鍾或cpu-clock事件，如果它們出現在perf stat事件集中）

} else if (runtime_nsecs_stats[cpu].n != 0) {
    char unit = 'M';
    char unit_buf[10];

    total = avg_stats(&runtime_nsecs_stats[cpu]);

    if (total)
        ratio = 1000.0 * avg / total;
    if (ratio < 0.001) {
        ratio *= 1000;
        unit = 'K';
    }
    snprintf(unit_buf, sizeof(unit_buf), "%c/sec", unit);
    print_metric(ctxp, NULL, "%8.3f", unit_buf, ratio);
}

perf stat中的公式

問題描述

1 個解決方案

解決方案1
3 已采納 2017-10-05 00:00:11

perf stat中的公式

問題描述

1 個解決方案

解決方案1 3 已采納 2017-10-05 00:00:11

解決方案1
3 已采納 2017-10-05 00:00:11