简体   繁体   English

在C ++中进行实验时,测量某些函数的cpu时间的最佳方法是什么?

[英]What is the best way to measure the cpu time of some functions when doing experiments in C++?

I have some code in C++ and would like to measure the running time (cpu time) of various functions. 我在C ++中有一些代码,想测量各种功能的运行时间(cpu时间)。

I know this has been asked many times, however as in all questions (one can be found here , another here ) you get all sorts of answers. 我知道这个问题已经问过很多遍了,但是在所有问题中(一个可以在这里找到,另一个可以在这里找到),您会得到各种各样的答案。 Some use clock, some use gettimeofday, some use weird functions, others external libraries. 一些使用时钟,一些使用gettimeofday,一些使用怪异的函数,其他使用外部库。

Which method offers the best precision and reliability? 哪种方法提供最佳的精度和可靠性? I would like to be able to get at most down to nanoseconds? 我想最多可以降到纳秒级吗?

I work under Ubuntu 14.04. 我在Ubuntu 14.04下工作。

Thank you in advance. 先感谢您。

TLDR: You can get a pretty good idea about hotspots with millisecond resolution, but nanosecond resolution doesn't work for various reasons. TLDR:您可以对毫秒分辨率的热点有个很好的了解,但是由于种种原因,纳秒分辨率无法正常工作。

You can probably find or write some function that gives you the best resolution your computer can provide, however, this still doesn't give you any meaningful results: 您可能会找到或编写一些函数来为您提供计算机可以提供的最佳分辨率,但是,这仍然不能给您带来任何有意义的结果:

auto start = getBestPrecisionTime();
foo();
auto end = getBestPrecisionTime();
std::cout << "foo took " << to_nanoseconds(end - start) << "ns";

The first issue is that foo() gets interrupted by another program and you are not actually measuring foo() but foo() + some_random_service. 第一个问题是foo()被另一个程序中断,您实际上并不是在测量foo()而是foo() + some_random_service。 A way to get around that is to make 1000 measurements, hope that at least one of them didn't get interrupted and take the minimum of those measurements. 解决该问题的一种方法是进行1000次测量,希望其中至少一个不会受到干扰,并进行最少的测量。 Depending on how long foo() actually takes your chances are anywhere from always to never. 取决于foo()实际花费了多长时间,您可能会遇到从永远到永远的任何机会。

Similarly foo() probably accesses memory which is somewhere in level 1/2/3/4 cache, RAM or on the harddrive, so again you are measuring the wrong thing. 类似地, foo()可能访问的内存位于1/2/3/4级缓存,RAM或硬盘驱动器中的某个位置,因此,您又在测量错误的内容。 You would need to get real world data of how likely it is that memory that foo() needs is in which memory and has which access times. 您将需要获取真实的数据,以了解foo()需要的内存在哪个内存中以及在哪个访问时间中的可能性。

Another major issue is optimization. 另一个主要问题是优化。 It doesn't make much sense to measure the performance of a debug version, so you will want to measure with maximum optimization enabled. 测量调试版本的性能没有多大意义,因此您需要在启用最大优化的情况下进行测量。 With a high optimization level the compiler will reorder and inline code. 在较高的优化级别下,编译器将重新排序和内联代码。 The getBestPrecisionTime function has two options: Allow the compiler to move code past it or not. getBestPrecisionTime函数具有两个选项:允许编译器通过或不通过代码。 If it allows reordering the compiler will do this: 如果允许重新排序,则编译器将执行以下操作:

foo();
auto start = getBestPrecisionTime();
auto end = getBestPrecisionTime();
std::cout << "foo took " << to_nanoseconds(end - start) << "ns";

and then optimize it further to 然后进一步优化以

std::cout << "foo took 0ns";

Obviously this produces wrong results and all timing functions I have come across add barriers to disallow this. 显然,这会产生错误的结果,我遇到的所有计时功能都会增加障碍,从而无法实现这一目标。

But the alternative is not much better. 但是替代方案并没有太大改善。 Without the measurement the compiler may optimize this 如果没有测量,编译器可能会对此进行优化

foo();
bar();

into 进入

code_that_does_foo_bar;

which is more efficient due to better utilization of registers/SIMD instructions/caching/.... But once you measure the performance you disabled this optimization and you measure the wrong version. 由于更好地利用了寄存器/ SIMD指令/缓存/ ...,因此效率更高。但是一旦测量了性能,就禁用了此优化,并且测量了错误的版本。 With a lot of work you may be able to extract which assembler instructions inside code_that_does_foo_bar originated from foo() , but since you can't even tell exactly how long an assembler instruction takes and that time also depends on surrounding assembler instructions you have no chance to get an accurate number for optimized code. 通过大量的工作,您也许可以提取code_that_does_foo_bar哪些汇编器指令源自foo() ,但是由于您甚至无法确切知道汇编器指令需要多长时间,而且该时间还取决于周围的汇编器指令,因此您没有机会以获得准确的代码以优化代码。

The best you can do is just use std::chrono::high_resolution_clock because it just doesn't get much more precise. 最好的办法就是使用std::chrono::high_resolution_clock因为它并没有变得更加精确。

您可以尝试从Google进行基准测试库: https//github.com/google/benchmark

Your question is way too broad to give one answer to rule them all. 您的问题范围太广,无法给出一个答案来统治所有人。 Depending on your requirements, if you want a cross-platform solution, then std::chrono::high_resolution_clock might fit the bill. 根据您的要求,如果您需要跨平台的解决方案,则std :: chrono :: high_resolution_clock可能适合您。 If you don't have access to a C++11 compiler or better that supports that, then the various good 'ol C library time functions might suffice. 如果您没有访问支持该功能的C ++ 11或更高版本的编译器,则各种良好的'ol C库时间函数可能就足够了。 If cross-platform is not an issue and you're only interested in, say Windows, then depending on your resolution needs, QueryPerfomanceCounter or GetTickCount can be used. 如果说跨平台不是问题,而您只对Windows感兴趣,那么根据您的分辨率需求,可以使用QueryPerfomanceCounter或GetTickCount。

If you have specific needs, please mention that in the question. 如果您有特定需求,请在问题中提及。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM