简体   繁体   English

两个语句执行之间的时间差不一致

[英]Time difference between execution of two statements is not consistent

Could you please tell me why the value of timediff printed by the following program is often 4 microseconds (in the range 90 to 1000 times for different runs), but sometimes 70 or more microseconds for a few cases (in the range of 2 to 10 times for different runs): 您能否告诉我为什么以下程序打印的timediff值通常为4微秒(对于不同的运行,范围为90到1000倍),但在某些情况下有时为70或更多微秒(范围为2到10)不同运行次数):

#include <iostream>
using namespace std;
#include<sys/time.h>
#define MAXQ 1000000
#define THRDS 3
double GetMicroSecond()
{
    timeval tv;
    gettimeofday (&tv, NULL);
    return (double) (((double)tv.tv_sec * 1000000) + (double)tv.tv_usec);
}

int main()
{
        double timew, timer, timediff;
        bool flagarray[MAXQ];
        int x=0, y=0;
        for(int i=0; i<MAXQ; ++i)
            flagarray[i] = false;
        while(y <MAXQ)
       {
            x++;
            if(x%1000 == 0)
            {
                    timew = GetMicroSecond();
                    flagarray[y++]=true;
                    timer = GetMicroSecond();
                    timediff = timer - timew;
                    if(timediff > THRDS) cout << timer-timew << endl;
            }
       }
}

Compiled using: g++ testlatency.cpp -o testlatency 使用以下命令编译:g ++ testlatency.cpp -o testlatency

Note: In my system there are 12 cores. 注意:在我的系统中,有12个核心。 The performance is checked with only this program running in the system. 仅在系统中运行此程序时检查性能。

Generally, there are many threads sharing a small number of cores. 通常,有许多线程共享少量内核。 Unless you take steps to ensure that your thread has uninterrupted use of a core, you can't guarantee that the OS won't decide to preempt your thread between the two calls GetMicroSecond() calls, and let some other thread use the core for a bit. 除非您采取措施确保线程不间断地使用内核,否则无法保证操作系统不会决定在两次调用GetMicroSecond()调用之间抢占您的线程,并让其他线程将内核用于一点点。

Even if your code runs uninterrupted, the line you're trying to time: 即使您的代码不间断运行,您仍要计时的行:

flagarray[y++]=true;

likely takes much less time to execute than the measurement code itself. 执行时间可能比测量代码本身要少得多。

There are many things happening inside of modern OS at the same time as Your program executes. 在您的程序执行的同时,现代操作系统内部发生了许多事情。 Some of them may may "steal" CPU from Your program as it is stated in NPE's answer. 如NPE的回答所述,其中一些可能会从您的程序中“窃取” CPU。 A few more examples of what can influence timing: 会影响时序的其他一些示例:

  • interrups from devices (timer, HDD, network interfaces a few to mention); 设备的中断(定时器,HDD,网络接口等);
  • access to RAM (caching) 访问RAM(缓存)

None of these are easily predictable. 这些都不容易预测。

You can expect consistency if You run Your code on some microcontroller, or maybe using real time OS . 如果您在某些微控制器上运行代码,或者使用实时操作系统 ,则可以期望达到一致性。

timew = GetMicroSecond(); timew = GetMicroSecond();
flagarray[y++]=true; flagarray [y ++] = true;
timer = GetMicroSecond(); 计时器= GetMicroSecond();

The statement flagarray[y++]=true; 语句flagarray[y++]=true; will take much less than a microsecond to execute on a modern computer if flagarray[y++ ] happens to be in the level 1 cache. 如果 flagarray[y++ ]恰好位于1级缓存中, 在现代计算机上执行将花费不到一微秒的时间。 The statement will take longer to execute if that location is in level 2 cache but not in level 1 cache, much longer if it is in level 3 cache but not in level 1 or level 2 cache, and much, much longer yet if it isn't in any of the caches. 如果该位置在2级缓存中,而不在1级缓存中,则该语句将花费更长的时间;如果在3级缓存中但不在1或2级缓存中,则该语句将花费更长的时间;如果不在3级缓存中,则该语句将花费更长的时间。不在任何缓存中。

Another thing that can make timer-timew exceed three milliseconds is when your program yields to the OS. 可能会使timer-timew超过三毫秒的另一件事是程序timer-timew OS的时间。 Cache misses can result in a yield. 高速缓存未命中可能会导致产量下降。 So can system calls. 系统调用也可以。 The function gettimeofday is a system call. 函数gettimeofday是系统调用。 As a general rule, you should expect any system call to yield. 通常,您应该期望任何系统调用都能产生收益。


Note: In my system there are 12 cores. 注意:在我的系统中,有12个核心。 The performance is checked with only this program running in the system. 仅在系统中运行此程序时检查性能。

This is not true. 这不是真的。 There are always many other programs, and many, many other threads running on your 12 core computer. 12核计算机上总是运行着许多其他程序,以及许多其他线程。 These include the operating system itself (which comprises many threads in and of itself), plus lots and lots of little daemons. 这些包括操作系统本身(本身包含许多线程),以及大量的小守护程序。 Whenever your program yields, the OS can decide to temporarily suspend your program so that one of the myriad other threads that are temporarily suspended but are asking for use of the CPU. 只要您的程序让步,OS便可以决定暂时挂起您的程序,以便使众多其他线程之一被临时挂起,但正在请求使用CPU。

One of those daemons is the Network Time Protocol daemon (ntpd). 这些守护程序之一是网络时间协议守护程序(ntpd)。 This does all kinds of funky little things to your system clock to keep it close to in sync with atomic clocks. 这会对系统时钟产生各种时髦的小事情,以使其与原子时钟保持紧密同步。 With a tiny little instruction such as flagarray[y++]=true being the only thing between successive calls to gettimeofday , you might even see time occasionally go backwards. 连续调用gettimeofday之间只有一条小小的指令,例如flagarray[y++]=true ,您甚至可能会发现时间有时会倒退。


When testing for timing, its a good idea to do the timing at a coarse level. 测试时序时,最好在粗略的水平上进行时序。 Don't time an individual statement that doesn't involve any function calls. 不要计时不涉及任何函数调用的单个语句。 It's much better to time a loop than it is to time than it is to time individual executions of the loop body. 这是更好的时间循环比它比它的循环体的时间单独执行时间。 Even then, you should expect some variability in timing because of cache misses and because the OS temporarily suspends execution of your program. 即使这样,由于高速缓存未命中以及操作系统会暂时中止程序的执行,您也应该期望时序会有所变化。

Modern Unix-based systems have better timers (eg, clock_gettime ) than gettimeofday that are not subject to changes made by the Network Time Protocol daemon. 基于现代Unix的系统比gettimeofday具有更好的计时器(例如, clock_gettime ),它们不受网络时间协议守护程序所做的更改的影响。 You should use one of these rather than gettimeofday . 您应该使用其中之一而不是gettimeofday

There are a lot of variables that might explain different time values seen. 有很多变量可以解释所看到的不同时间值。 I would focus more on 我会更专注于

  • Cache miss/fill 缓存未命中/填充
  • Scheduler Events 计划活动
  • Interrupts 中断

    bool flagarray[MAXQ]; bool flagarray [MAXQ];

Since you defined MAXQ to 1000000, let's assume that flagarray takes up 1MB of space. 既然你定义MAXQ 1000000,让我们假设flagarray占用空间1MB。

You can compute how many cache-misses can occur, based on your L1/L2 D-cache sizes. 你可以计算出许多缓存缺失是如何发生的,根据您的L1 / L2 d-缓存大小。 Then you can correlate with how many iterations it takes to fill all of L1 and start missing and same with L2. 然后,您可以将填满所有L1并开始丢失并与L2相同的迭代次数进行关联。 OS may deschedule your process and reschedule it - but, that I am hoping is less likely due to the number of cores you have. 操作系统可能会重新安排您的流程并重新安排它的时间-但是,我希望由于您拥有的内核数量而不太可能。 Same is the case with interrupts. 中断也是如此。 An idle system is never completely idle. 空闲的系统永远不会完全空闲。 You may choose to affine your process to a core number, say N by doing 您可以选择将您的过程与一个核心号码相仿,例如

taskset 0x<MASK> ./exe and control its execution. taskset 0x<MASK> ./exe并控制其执行。

If you are really curious, I would suggest that you use "perf" tool available on most Linux distros. 如果您真的很好奇,我建议您使用大多数Linux发行版中都提供的“性能”工具。

You may do 你可以做

perf stat -e L1-dcache-loadmisses

or 要么

perf stat -e LLC-load-misses

Once you have these numbers and the number of iterations you start building a picture of the activity that causes the noticed lag. 一旦有了这些数字和迭代次数,就可以开始构建导致明显滞后的活动图。 You may also monitor OS scheduler events using "perf stat". 您也可以使用“ perf stat”监视OS调度程序事件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM