简体   繁体   English

Intel Turbo Boost 的“假脱机时间”是多少?

[英]What is the "spool time" of Intel Turbo Boost?

Just like a turbo engine has "turbo lag" due to the time it takes for the turbo to spool up, I'm curious what is the "turbo lag" in Intel processors.就像涡轮发动机由于涡轮增压所需的时间而具有“涡轮迟滞”一样,我很好奇英特尔处理器中的“涡轮迟滞”是什么。

For instance, the i9-8950HK in my MacBook Pro 15" 2018 (running macOS Catalina 10.15.7) usually sits around 1.3 GHz when idle, but when I run a CPU-intensive program, the CPU frequency shoots up to, say 4.3 GHz or so (initially). The question is: how long does it take to go from 1.3 to 4.3 GHz? 1 microsecond? 1 milisecond? 100 miliseconds?例如,我的 MacBook Pro 15" 2018(运行 macOS Catalina 10.15.7)中的 i9-8950HK 在闲置时通常位于 1.3 GHz 左右,但当我运行 CPU 密集型程序时,CPU 频率会飙升至 4.3 GHz左右(最初)。问题是:从 1.3 到 4.3 GHz 到 go 需要多长时间?1 微秒?1 毫秒?100 毫秒?

I'm not even sure this is up to the hardware or the operating system.我什至不确定这取决于硬件或操作系统。

This is in the context of benchhmarking some CPU-intensive code which takes a few 10s of miliseconds to run.这是在对一些 CPU 密集型代码进行基准测试的上下文中,这些代码需要几十毫秒才能运行。 The thing is, right before this piece of CPU-intensive code is run, the CPU is essentially idle (and thus the clock speed will drop down to say 1.3 GHz).问题是,就在运行这段 CPU 密集型代码之前,CPU 基本上处于空闲状态(因此时钟速度将下降到 1.3 GHz)。 I'm wondering what slice of my benchmark is running at 1.3 GHz and what is running at 4.3 GHz: 1%/99%?我想知道我的基准测试的哪一部分以 1.3 GHz 运行,什么以 4.3 GHz 运行:1%/99%? 10%/90%? 10%/90%? 50%/50%? 50%/50%? Or even worse?或者更糟?

Depending on the answer, I'm thinking it would make sense to run some CPU-intensive code prior to starting the benchmark as a way to "spool up" TurboBoost.根据答案,我认为在开始基准测试之前运行一些 CPU 密集型代码作为“加速”TurboBoost 的一种方式是有意义的。 And this leads to another question: for how long should I run this "spooling-up" code?这引出了另一个问题:我应该运行这个“假脱机”代码多长时间? Probably one second is enough, but what if I'm trying to minimize this -- what's a safe amount of time for "spooling-up" code to run, to make sure the CPU will run the main code at the maximum frequency from the very first instruction executed?可能一秒钟就足够了,但如果我想尽量减少它——“假脱机”代码运行的安全时间是多少,以确保 CPU 将以最大频率运行主代码执行的第一条指令?

I wrote some code to check this, with the aid of the Intel Power Gadget API .我在Intel Power Gadget API的帮助下编写了一些代码来检查这一点。 It sleeps for one second (so the CPU goes back to its slowest speed), measures the clock speed, runs some code for a given amount of time, then measures the clock speed again.它休眠一秒钟(因此 CPU 回到其最低速度),测量时钟速度,在给定的时间内运行一些代码,然后再次测量时钟速度。

I only tried this on my 2018 15" MacBook Pro (i9-8950HK CPU) running macOS Catalina 10.15.7. The specific CPU-intensive code being run between clock speed measurements may also influence the result (is it integer only? FP? SSE? AVX? AVX-512?), so don't take these as exact numbers, but only order-of-magnitude/ballpark figures. I have no idea how the results translate into different hardware/OS/code combinations.我只在运行 macOS Catalina 10.15.7 的 2018 15" MacBook Pro(i9-8950HK CPU)上试过这个。在时钟速度测量之间运行的特定 CPU 密集型代码也可能影响结果(仅 integer 吗?FP?SSE ? AVX? AVX-512?),所以不要将这些作为精确数字,而只是数量级/大概数字。我不知道结果如何转化为不同的硬件/操作系统/代码组合。

The minimum clock speed when idle in my configuration is 1.3 GHz.在我的配置中闲置时的最低时钟速度是 1.3 GHz。 Here's the results I obtained in tabular form.这是我以表格形式获得的结果。

+--------+-------------+
| T (ms) | Final clock |
|        | speed (GHz) |
+--------+-------------+
| <1     | 1.3         |
| 1..3   | 2.0         |
| 4..7   | 2.5         |
| 8..10  | 2.9         |
| 10..20 | 3.0         |
| 25     | 3.0-3.1     |
| 35     | 3.3-3.5     |
| 45     | 3.5-3.7     |
| 55     | 4.0-4.2     |
| 66     | 4.6-4.7     |
+--------+-------------+

So 1 ms appears to be the minimum amount of time to get any kind of change.所以 1 毫秒似乎是获得任何类型变化的最短时间。 10 ms gets the CPU to its nominal frequency, and from then on it's a bit slower, apparently over 50 ms to reach maximum turbo frequencies. 10 毫秒使 CPU 达到其标称频率,从那时起它会慢一点,显然超过 50 毫秒才能达到最大涡轮频率。

Evaluation of CPU frequency transition latency paper presents transition latencies of various Intel processors. Evaluation of CPU frequency transition latency论文介绍了各种英特尔处理器的转换延迟。 In brief, the latency depends on the state in which the core currently is, and what is the target state. For an evaluated Ivy Bridge processor (i7-3770 @ 3.4 GHz) the latencies varied from 23 (1.6 GH -> 1.7 GHz) to 52 (2.0 GHz -> 3.4 GHz) micro-seconds.简而言之,延迟取决于核心当前所在的 state,以及目标 state 是什么。对于评估的 Ivy Bridge 处理器(i7-3770 @ 3.4 GHz),延迟从 23(1.6 GH -> 1.7 GHz)变化到 52(2.0 GHz -> 3.4 GHz)微秒。

At Hot Chips 2020 conference a major transition latency improvement of the future Ice Lake processor has been presented, which should have major impact mostly at partially vectorised code which uses AVX-512 instructions.Hot Chips 2020会议上,已经提出了未来 Ice Lake 处理器的一项重大过渡延迟改进,这应该主要对使用 AVX-512 指令的部分矢量化代码产生重大影响。 While these instructions do not support as high frequencies as SSE or AVX-2 instructions, using an island of these instructions cause down- and following up-scaling of the processor frequency.虽然这些指令不支持与 SSE 或 AVX-2 指令一样高的频率,但使用这些指令的一个岛会导致处理器频率的下调和后续上调。

Pre-heating a processor obviously makes sense, as well as "pre-heating" memory. One second of a prior workload is enough to reach the highest available turbo frequency, however you should take into account also temperature of the processor, which may down-scale the frequency (actually CPU core and uncore frequencies if speaking about one of the latest Intel processors).预热处理器显然是有意义的,“预热”memory 也是如此。一秒钟的先前工作负载足以达到最高可用涡轮频率,但是您还应该考虑处理器的温度,这可能会降低-缩放频率(如果谈论最新的英特尔处理器之一,实际上是 CPU 核心和非核心频率)。 You are not able to reach the temperature limit in a second.您无法在一秒钟内达到温度限制。 But it depends, what you want to measure by your benchmark, and if you want to take into account the temperature limit.但这取决于您要通过基准测量的内容,以及是否要考虑温度限制。 When speaking about temperature limit, be aware that your processor also has a power limit, which is another possible reason for down-scaling the frequency during the application run.谈到温度限制时,请注意您的处理器也有功率限制,这是在应用程序运行期间降低频率的另一个可能原因。

Another think that you should take into account when benchmarking your code is that its runtime is very short.另一个认为在对代码进行基准测试时应该考虑的因素是它的运行时间非常短。 Be aware of the runtime/resources consumption measurement reliability.注意运行时/资源消耗测量的可靠性。 I would suggest an artificially extending the runtime (run the code 10 times and measure the overall consumption) for better results.我建议人为地延长运行时间(运行代码 10 次并测量整体消耗)以获得更好的结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM