记录执行时间：计算机如何快速计算出算术？

Question

My question seems elementary at first, but bear with me. 一开始我的问题似乎很基本，但请耐心等待。

I wrote the following code in order to test how long it would take python to count from 1 to 1,000,000. 我编写了以下代码，以测试python从1计数到1,000,000需要多长时间。

import time

class StopWatch:
    def __init__(self, startTime = time.time()):
        self.__startTime = startTime
        self.__endTime = 0

    def getStartTime(self):
        return self.__startTime

    def getEndTime(self):
        return self.__endTime



    def stop(self):
        self.__endTime = time.time()



    def start(self):
        self.__startTime = time.time()

    def getElapsedTime(self):
        return self.__endTime - self.__startTime


count = 0
Timer = StopWatch()
for i in range(1, 1000001):
    count += i

Timer.stop()
total_Time = Timer.getElapsedTime()
print("Total time elapsed to count to 1,000,000: ",total_Time," milliseconds")

I calculated a surprisingly short time span. 我计算出令人惊讶的短时间跨度。 It was 0.20280098915100098 milliseconds. 是0.20280098915100098毫秒。 I first want to ask: Is this correct? 我首先要问： 这是正确的吗？

I expected execution to be at least 2 or 3 milliseconds, but I did not anticipate it would be able to make that computation in less than a half of a millisecond! 我希望执行时间至少为2或3毫秒，但我没想到它可以在不到半毫秒的时间内完成计算！

If this is correct, that leads me to my secondary question: WHY is it so fast? 如果这是正确的，那将引出我的第二个问题： 为什么这么快？

I know CPUs are essentially built for arithmetic, but I still wouldn't anticipate it being able to count to one million in two tenths of a millisecond! 我知道CPU本质上是为算术而构建的，但我仍然不希望它能够在十分之二毫秒的时间内达到百万。

Answer 1

Maybe you were tricked by time measure unit, as @jonrsharpe commented. 就像@jonrsharpe所说的那样，也许您被时间计量单位欺骗了。

Nevertheless, a 3rd generation Intel i7 is capable of 120+GIPS (ie billions of elementary operations per second), so assuming all cache hits and no context switch (put simply, no unexpected waits), it could easily count from 0 to 1G in said time and even more. 不过，第三代Intel i7能够达到120 + GIPS（即每秒数十亿次基本操作），因此，假设所有缓存命中且没有上下文切换（简单地说，没有意外等待），它可以很容易地从0到1G计数。说时间甚至更多。 Probably not with Python, since it has some overhead, but still possible. 可能不是使用Python，因为它有一些开销，但仍然可行。

Explaining how a modern CPU can achieve such an... "insane" speed is quite a broad subject, actually the collaboration of more than one technology: 解释现代CPU如何实现如此“疯狂”的速度是一个广泛的话题，实际上是不止一种技术的协作：

a dynamic scheduler will rearrange elementary instructions to reduce conflicts (thus, waits) as much as possible 动态调度程序将重新排列基本指令，以尽可能减少冲突（因此，等待）
a well-engineered cache will promptly provide code and (although less problematic for this benchmark) data. 精心设计的缓存将立即提供代码和（尽管对于该基准测试而言，问题较少）。
a dynamic branch predictor will profile code and speculate on branch conditions (eg "for loop is over or not?") to anticipate jumps with a high chance of "winning". 动态分支预测器将分析代码并推测分支条件（例如“ for循环是否结束？”），以预测发生跳跃的可能性很高，并且“获胜”。
a good compiler will provide some additional effort by rearranging instructions in order to reduce conflicts or making loops faster (by unrolling, merging, etc.) 一个好的编译器将通过重新布置指令来减少冲突或使循环更快（通过展开，合并等）来提供额外的精力。
multi-precision arithmetic could exploit vectorial operations provided by the MMX set and similar. 多精度算术可以利用MMX集合和类似集合提供的矢量运算。

In short, there is more than a reason why those small wonders are so expensive :) 简而言之，这些小奇迹如此昂贵是有原因的:)

Answer 2

First, as has been pointed out, time() output is actually in seconds, not milliseconds. 首先，正如已经指出的，time（）输出实际上以秒为单位，而不是毫秒。

Also, you are actually performing 1m additions to a total of 1m**2 /2, not counting to 1m, and you are initializing a million-long list (unless you are on python 3) with range . 另外，您实际上是对总共1m ** 2/2执行1m次加法，不算到1m，并且您正在初始化一个带有range一百万个长的列表（除非您在python 3上）。

I ran a simpler test on my laptop: 我在笔记本电脑上进行了一个更简单的测试：

start = time.time()
i = 0;
while i < 1000000:
   i+=1
print time.time() - start

Result: 结果：

0.069179093451

So, 70 milliseconds. 因此，为70毫秒。 That translates to 14 million operations per second. 这意味着每秒可进行1400 万次操作。

Let's look at the table that Stefano probably referred to ( http://en.wikipedia.org/wiki/Instructions_per_second ) and do a rough estimation. 让我们看一下Stefano可能引用的表（ http://en.wikipedia.org/wiki/Instructions_per_second ）并进行粗略估计。 They don't have an i5 like I do, but the slowest i7 will be close enough. 他们没有像我一样的i5，但是最慢的i7足够接近。 It clocks 80 GIPS with 4 cores, 20 GIPS per core. 它具有4个内核的80 GIPS时钟，每个内核20 GIPS。

(By the way, if your question is "how does it manage to get 20 GIPS per core?", can't help you. It's ~~maaaagic~~ nanotechnology) （顺便说一句，如果您的问题是“如何使每个内核获得20 GIPS？”，这 ~~无济于事~~ 。这是 ~~maaaagic~~ 纳米技术）

So the core is capable of 20 billion operations per second, and we get only 14 million - different by a factor of 1400 . 因此，该内核每秒能够进行200 亿次操作，而我们得到的1400 万次差异仅为1400倍。

At this point the right question is not "why so fast?", by " why so slow? ". 此时，正确的问题不是“为什么这么快？”，不是“ 为什么这么慢？ ”。 Probably python overhead. 可能是python开销。 What if we try this in C? 如果我们在C中尝试该怎么办？

#include <stdio.h>
#include <unistd.h>
#include <time.h>

int i = 0;
int million = 1000000;
int main() {

    clock_t cstart = clock();
    while (i < million) {
     i += 1;
    }

    clock_t cend = clock();
    printf ("%.3f cpu sec\n", ((double)cend - (double)cstart) / CLOCKS_PER_SEC);
    return 0;
}

Result: 结果：

0.003 cpu sec

This is 23 times faster than python, and only 60 times different from the number of theoretical 'elementary operations' per second. 它比python快23倍，与每秒理论“基本操作”的数量仅相差60倍。 I see two operations here - comparison and addition, so 30 times different. 我在这里看到两个操作-比较和加法，所以相差30倍。 This is entirely reasonable, as elementary operations are probably much smaller than our addition and comparison (let assembler experts tell us), and also we didn't factor in context switches, cache misses, time calculation overhead and who knows what else. 这是完全合理的，因为基本操作可能比我们的加法和比较小得多（让汇编专家告诉我们），而且我们也没有考虑上下文切换，缓存未命中，时间计算开销以及谁知道其他什么。

This also suggests that python performs 23 times as much operations to do the same thing. 这也表明python执行相同操作的操作量是23倍。 This is also entirely reasonable, because python is a high-level language. 这也是完全合理的，因为python是一种高级语言。 This is the kind of penalty you get in high level languages - and now you understand why speed-critical sections are usually written in C. 这是高级语言中的一种惩罚-现在您了解了为什么速度至关重要的部分通常用C编写。

Also, python's integers are immutable, and memory should be allocated for each new integer (python runtime is smart about it, but nevertheless). 另外，python的整数是不可变的，应该为每个新的整数分配内存（python运行时对此很聪明，不过）。

I hope that answers your question and teaches you a little bit about how to perform incredibly rough estimations =) 我希望能回答您的问题，并教给您一些有关如何执行令人难以置信的粗略估计=）

Answer 3

Short answer: As jonrsharpe mentioned in the comments, it's seconds, not milliseconds. 简短答案：正如jonrsharpe在评论中提到的，这是秒，而不是毫秒。

Also as Stefano said, xxxxxx --> check his posted answer. 也正如Stefano所说，xxxxxx->检查他发布的答案。 It has a lot of detail beyond just the ALU . 除了ALU之外，它还有很多细节。

I'm just writing to mention - when you make default values in your classes or functions, make sure to use simple immutable instead of putting a function call or something like that. 我只是想提一提-当您在类或函数中设置默认值时，请确保使用简单的不可变而不是进行函数调用或类似的操作。 Your class is actually setting the start time of the timer for all instances - you will get a nasty surprise if you create a new Timer because it will use the previous value as the initial value. 您的课程实际上是在为所有实例设置计时器的开始时间-如果创建新的Timer，您会感到非常惊讶，因为它将使用以前的值作为初始值。 Try this and the timer does not get reset for the second Timer 尝试此操作，第二个Timer不会重置Timer

#...
count = 0
Timer = StopWatch()
time.sleep(1)
Timer - StopWatch()
for i in range(1, 1000001):
    count += i
Timer.stop()
total_Time = Timer.getElapsedTime()
print("Total time elapsed to count to 1,000,000: ",total_Time," milliseconds")

You will get about 1 second instead of what you expect. 您将得到大约1秒的时间，而不是您期望的时间。

记录执行时间：计算机如何快速计算出算术？

问题描述

3 个解决方案

解决方案1
3 2014-02-28 12:07:59

解决方案2
2 已采纳 2014-02-28 13:03:51

解决方案3
1 2014-02-28 12:09:41

记录执行时间：计算机如何快速计算出算术？

问题描述

3 个解决方案

解决方案1 3 2014-02-28 12:07:59

解决方案2 2 已采纳 2014-02-28 13:03:51

解决方案3 1 2014-02-28 12:09:41

解决方案1
3 2014-02-28 12:07:59

解决方案2
2 已采纳 2014-02-28 13:03:51

解决方案3
1 2014-02-28 12:09:41