简体   繁体   English

运行 K 次基准代码的 Unix 命令

[英]Unix Command For Benchmarking Code Running K times

Suppose I have a code executed in Unix this way:假设我有一个以这种方式在 Unix 中执行的代码:

$ ./mycode

My question is is there a way I can time the running time of my code executed K times.我的问题是有没有一种方法可以计算我的代码执行 K 次的运行时间。 The value of K = 1000 for example.例如,K 的值 = 1000。

I am aware of Unix "time" command, but that only executed 1 instance.我知道 Unix“时间”命令,但只执行了 1 个实例。

改进/澄清查理的回答:

time (for i in $(seq 10000); do ./mycode; done)

try尝试

$ time ( your commands )

write a loop to go in the parens to repeat your command as needed.编写一个循环进入括号以根据需要重复您的命令。

Update更新

Okay, we can solve the command line too long issue.好的,我们可以解决命令行太长的问题。 This is bash syntax, if you're using another shell you may have to use expr(1) .这是 bash 语法,如果您使用另一个 shell,您可能必须使用expr(1)

$ time (
> while ((n++ < 100)); do echo "n = $n"; done
> )

real    0m0.001s
user    0m0.000s
sys     0m0.000s

Just a word of advice: Make sure this "benchmark" comes close to your real usage of the executed program.只是一个建议:确保这个“基准”接近你对执行程序的实际使用。 If this is a short living process, there could be a significant overhead caused by the process creation alone.如果这是一个短暂的进程,那么仅创建进程就可能会产生很大的开销。 Don't assume that it's the same as implementing this as a loop within your program.不要假设它与程序中将其实现为循环相同。

To enhance a little bit some other responses, some of them (those based on seq) may cause a command line too long if you decide to test, say one million times.为了增强一些其他响应,如果您决定测试,例如一百万次,其中一些(基于 seq 的响应)可能会导致命令行太长。 The following does not have this limitation以下没有这个限制

time ( a=0 ; while test $a -lt 10000 ; do echo $a ; a=`expr $a + 1` ; done)

Another solution to the "command line too long" problem is to use a C-style for loop within bash: “命令行太长”问题的另一个解决方案是在 bash 中使用 C 风格的 for 循环:

 $ for ((i=0;i<10;i++)); do echo $i; done

This works in zsh as well (though I bet zsh has some niftier way of using it, I'm just still new to zsh).这也适用于 zsh(尽管我敢打赌 zsh 有一些更好的使用方式,但我对 zsh 还是很陌生)。 I can't test others, as I've never used any other.我无法测试其他人,因为我从未使用过任何其他人。

forget time , hyperfine will do exactly what you are looking for: https://github.com/sharkdp/hyperfine忘记time ,超精细将完全满足您的需求: https ://github.com/sharkdp/hyperfine

% hyperfine 'sleep 0.3'
Benchmark 1: sleep 0.3
  Time (mean ± σ):     310.2 ms ±   3.4 ms    [User: 1.7 ms, System: 2.5 ms]
  Range (min … max):   305.6 ms … 315.2 ms    10 runs

Linux perf stat has a -r repeat_count option. Linux perf stat有一个-r repeat_count选项。 Its output only gives you the mean and standard deviation for each HW/software event, not min/max as well.它的输出只为您提供每个硬件/软件事件的平均值和标准偏差,而不是最小值/最大值。

It doesn't discard the first run as a warm-up or anything either, but it's somewhat useful in a lot of cases.它不会将第一次运行作为热身或任何东西丢弃,但在很多情况下它有点有用。

Scroll to the right for the stddev results like ( +- 0.13% ) for cycles.向右滚动以查看 stddev 结果,例如( +- 0.13% )循环。 Less variance in that than in task-clock , probably because CPU frequency was not fixed.task-clock相比,这方面的差异较小,可能是因为 CPU 频率固定。 (I intentionally picked a quite short run time, although with Skylake hardware P-state and EPP=performance, it should be ramping up to max turbo quite quickly even compared to a 34 ms run time. But for a CPU-bound task that's not memory-bound at all, its interpreter loop runs at a constant number of clock cycles per iteration, modulo only branch misprediction and interrupts. --all-user is counting CPU events like instructions and cycles only for user-space, not inside interrupt handlers and system calls / page-faults.) (我故意选择了一个相当短的运行时间,尽管使用 Skylake 硬件 P-state 和 EPP=performance,即使与 34 毫秒的运行时间相比,它也应该非常快地提升到最大 turbo。但是对于 CPU 密集型任务来说,这不是完全受内存限制,它的解释器循环在每次迭代中以恒定数量的时钟周期运行,仅对分支错误预测和中断进行模运算--all-user仅对用户空间的指令和周期等 CPU 事件进行计数,而不是在中断处理程序内部和系统调用/页面错误。)

$ perf stat --all-user -r5   awk 'BEGIN{for(i=0;i<1000000;i++){}}'

 Performance counter stats for 'awk BEGIN{for(i=0;i<1000000;i++){}}' (5 runs):

             34.10 msec task-clock                #    0.984 CPUs utilized            ( +-  0.40% )
                 0      context-switches          #    0.000 /sec                   
                 0      cpu-migrations            #    0.000 /sec                   
               178      page-faults               #    5.180 K/sec                    ( +-  0.42% )
       139,277,791      cycles                    #    4.053 GHz                      ( +-  0.13% )
       360,590,762      instructions              #    2.58  insn per cycle           ( +-  0.00% )
        97,439,689      branches                  #    2.835 G/sec                    ( +-  0.00% )
            16,416      branch-misses             #    0.02% of all branches          ( +-  8.14% )

          0.034664 +- 0.000143 seconds time elapsed  ( +-  0.41% )

awk here is just a busy-loop to give us something to measure.这里的awk只是一个繁忙的循环,可以给我们一些衡量的东西。 If you're using this to microbenchmark a loop or function , construct it to have minimal startup overhead as a fraction of total run time, so perf stat event counts for the whole run mostly reflect the code you wanted to time.如果您使用它来对循环或函数进行微基准测试,请将其构建为具有最小启动开销作为总运行时间的一小部分,因此整个运行的perf stat事件计数主要反映了您想要计时的代码。 Often this means building a repeat-loop into your own program, to loop over the initialized data multiple times.这通常意味着在您自己的程序中构建一个重复循环,以多次循环初始化数据。

See also Idiomatic way of performance evaluation?另请参阅绩效评估的惯用方式? - timing very short things is hard due to measurement overhead. - 由于测量开销,很难对非常短的事情进行计时。 Carefully constructing a repeat loop that tells you something interesting about the throughput or latency of your code under test is important.仔细构建一个重复循环,告诉您关于被测代码的吞吐量或延迟的一些有趣的事情是很重要的。


Run-to-run variation is often a thing, but often back-to-back runs like this will have less variation within the group than between runs separated by half a second to up-arrow/return.跑步之间的变化通常是一回事,但通常像这样的背靠背跑步在组内的变化比在向上箭头/返回之间相隔半秒的跑步之间的变化要小。 Perhaps something to do with transparent hugepage availability, or choice of alignment?也许与透明的大页面可用性或对齐方式的选择有关? Usually for small microbenchmarks, so not sensitive to the file getting evicted from the pagecache.通常用于小型微基准,因此对从页面缓存中逐出的文件不敏感。

(The +- range printed by perf is just I think one standard deviation based on the small sample size, not the full range it saw.) perf打印的 +- 范围只是我认为基于小样本量的一个标准偏差,而不是它看到的全部范围。)

If you're worried about the overhead of constantly load and unloading the executable into process space, I suggest you set up a ram disk and time your app from there.如果您担心不断地将可执行文件加载和卸载到进程空间的开销,我建议您设置一个 ram 磁盘并从那里为您的应用程序计时。

Back in the 70's we used to be able to set a "sticky" bit on the executable and have it remain in memory.. I don't know of a single unix which now supports this behaviour as it made updating applications a nightmare.... :o)早在 70 年代,我们曾经能够在可执行文件上设置一个“粘性”位并将其保留在内存中。我不知道有哪个 unix 现在支持这种行为,因为它使更新应用程序成为一场噩梦。 .. :o)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM