为什么Sleep（）会使后续代码减速40ms？

Question

I originally asked about this at coderanch.com, so if you've tried to assist me there, thanks, and don't feel obliged to repeat the effort. 我最初在coderanch.com上询问过这个问题，所以如果你试图在那里帮助我，谢谢，并且不觉得有必要重复这些努力。 coderanch.com is mostly a Java community, though, and this appears (after some research) to really be a Windows question, so my colleagues there and I thought this might be a more appropriate place to look for help. 但是，coderanch.com主要是一个Java社区，这看起来（经过一些研究后）确实是一个Windows问题，所以我和那里的同事认为这可能是寻找帮助的更合适的地方。

I have written a short program that either spins on the Windows performance counter until 33ms have passed, or else calls Sleep(33). 我写了一个简短的程序，要么在Windows性能计数器上旋转直到33ms，要么调用Sleep（33）。 The former exhibits no unexpected effects, but the latter appears to (inconsistently) slow subsequent processing for about 40ms (either that, or it has some effect on the values returned from the performance counter for that long). 前者没有表现出意想不到的效果，但后者似乎（不一致地）使后续处理减慢了大约40ms（或者这对于那个长时间从性能计数器返回的值有一些影响）。 After the spin or Sleep(), the program calls a routine, runInPlace(), that spins for 2ms, counting the number of times it queries the performance counter, and returning that number. 在旋转或Sleep（）之后，程序调用一个例程runInPlace（），它旋转2ms，计算它查询性能计数器的次数，并返回该数字。

When the initial 33ms delay is done by spinning, the number of iterations of runInPlace() tends to be (on my Windows 10, XPS-8700) about 250,000. 当通过旋转完成最初的33ms延迟时，runInPlace（）的迭代次数往往是（在我的Windows 10上，XPS-8700）大约250,000。 It varies, probably due to other system overhead, but it varies smoothing around 250,000. 它可能由于其他系统开销而变化，但它变化平滑约250,000。

Now, when the initial delay is done by calling Sleep(), something strange happens. 现在，当通过调用Sleep（）完成初始延迟时，会发生一些奇怪的事情。 A lot of the calls to runInPlace() return a number near 250,000, but quite a few of them return a number near 50,000. 很多对runInPlace（）的调用都会返回一个接近250,000的数字，但是其中相当一部分返回的数字接近50,000。 Again, the range varies around 50,000, fairly smoothly. 同样，范围变化大约50,000，相当顺利。 But, it is clearly averaging one or the other, with nearly no returns anywhere between 80,000 and 150,000. 但是，它显然平均了一个或另一个，几乎没有返回80,000到150,000之间的任何回报。 If I call runInPlace() 100 times after each delay, instead of just once, it never returns a number of iterations in the smaller range after the 20th call. 如果我在每次延迟后调用runInPlace（）100次，而不是只调用一次，那么在20次调用之后，它永远不会在较小的范围内返回多次迭代。 As runInPlace() runs for 2ms, this means the behavior I'm observing disappears after 40ms. 由于runInPlace（）运行2ms，这意味着我观察到的行为在40ms后消失。 If I have runInPlace() run for 4ms instead of 2ms, it never returns a number of iterations in the smaller range after the 10th call, so, again, the behavior disappears after 40ms (likewise if have runInPlace() run for only 1ms; the behavior disappears after the 40th call). 如果我运行runInPlace（）运行4ms而不是2ms，它在第10次调用之后永远不会返回较小范围内的迭代次数，因此，再次，行为在40ms后消失（同样如果runInPlace（）运行仅1ms;在第40次通话后，行为消失了）。

Here's my code: 这是我的代码：

#include "stdafx.h"
#include "Windows.h"

int runInPlace(int msDelay)
{
    LARGE_INTEGER t0, t1;
    int n = 0;

    QueryPerformanceCounter(&t0);

    do
    {
            QueryPerformanceCounter(&t1);
            n++;
    } while (t1.QuadPart - t0.QuadPart < msDelay);

    return n;
}

int _tmain(int argc, _TCHAR* argv[])
{
    LARGE_INTEGER t0, t1;
    LARGE_INTEGER frequency;
    int n;

    QueryPerformanceFrequency(&frequency);

    int msDelay = 2 * frequency.QuadPart / 1000;

    int spinDelay = 33 * frequency.QuadPart / 1000;

    for (int i = 0; i < 100; i++)
    {
        if (argc > 1)
            Sleep(33);
        else
        {
            QueryPerformanceCounter(&t0);

            do
            {
                    QueryPerformanceCounter(&t1);
            } while (t1.QuadPart - t0.QuadPart < spinDelay);
        }

        n = runInPlace(msDelay);
        printf("%d \n", n);
    }

    getchar();

    return 0;
}

Here's some output typical of what I get when using Sleep() for the delay: 以下是使用Sleep（）进行延迟时的一些典型输出：

56116 248936 53659 34311 233488 54921 47904 45765 31454 55633 55870 55607 32363 219810 211400 216358 274039 244635 152282 151779 43057 37442 251658 53813 56237 259858 252275 251099 56116 248936 53659 34311 233488 54921 47904 45765 31454 55633 55870 55607 32363 219810 211400 216358 274039 244635 152282 151779 43057 37442 251658 53813 56237 259858 252275 251099

And here's some output typical of what I get when I spin to create the delay: 这是我旋转创建延迟时得到的一些典型输出：

276461 280869 276215 280850 188066 280666 281139 280904 277886 279250 244671 240599 279697 280844 159246 271938 263632 260892 238902 255570 265652 274005 273604 150640 279153 281146 280845 248277 276461 280869 276215 280850 188066 280666 281139 280904 277886 279250 244671 240599 279697 280844 159246 271938 263632 260892 238902 255570 265652 274005 273604 150640 279153 281146 280845 248277

Can anyone help me understand this behavior? 任何人都可以帮我理解这种行为吗？ (Note, I have tried this program, compiled with Visual C++ 2010 Express, on five computers. It only shows this behavior on the two fastest machines I have.) （注意，我在五台计算机上尝试过用Visual C ++ 2010 Express编译的程序。它只在我拥有的两台最快的机器上显示这种行为。）

Answer 1

This sounds like it is due to the reduced clock speed that the CPU will run at when the computer is not busy (SpeedStep). 这听起来像是由于CPU在计算机不忙时运行的时钟速度降低（SpeedStep）。 When the computer is idle (like in a sleep) the clock speed will drop to reduce power consumption. 当计算机空闲时（如在睡眠中），时钟速度将下降以降低功耗。 On newer CPUs this can be 35% or less of the listed clock speed. 在较新的CPU上，这可以是所列时钟速度的35％或更低。 Once the computer gets busy again there is a small delay before the CPU will speed up again. 一旦计算机再次忙碌，在CPU再次加速之前会有一个小的延迟。

You can turn off this feature (either in the BIOS or by changing the "Minimum processor state" setting under "Processor power management" in the advanced settings of your power plan to 100%. 您可以关闭此功能（在BIOS中或通过将电源计划的高级设置中“处理器电源管理”下的“最低处理器状态”设置更改为100％。

Answer 2

Besides what @1201ProgramAlarm said (which may very well be, modern processors are extremely fond of downclocking whenever they can), it may also be a cache warming up problem. 除了@ 1201ProgramAlarm所说的（可能很好，现代处理器非常喜欢随时都能降频 ），它也可能是缓存预热问题。

When you ask to sleep for a while the scheduler typically schedules another thread/process for the next CPU time quantum, which means that the caches (instruction cache, data cache, TLB, branch predictor data, ...) relative to your process are going to be "cold" again when your code regains the CPU. 当您要求休眠一段时间时，调度程序通常会为下一个CPU时间量程调度另一个线程/进程，这意味着相对于您的进程的高速缓存（指令高速缓存，数据高速缓存，TLB，分支预测器数据......）是当您的代码重新获得CPU时，将再次“冷”。

为什么Sleep（）会使后续代码减速40ms？

问题描述

2 个解决方案

解决方案1
9 已采纳 2016-02-28 02:02:04

解决方案2
5 2016-02-28 02:11:13

为什么Sleep（）会使后续代码减速40ms？

问题描述

2 个解决方案

解决方案1 9 已采纳 2016-02-28 02:02:04

解决方案2 5 2016-02-28 02:11:13

解决方案1
9 已采纳 2016-02-28 02:02:04

解决方案2
5 2016-02-28 02:11:13