Recording time of execution: How do computers calculate arithemetic so fast?

Question

My question seems elementary at first, but bear with me.

I wrote the following code in order to test how long it would take python to count from 1 to 1,000,000.

import time

class StopWatch:
    def __init__(self, startTime = time.time()):
        self.__startTime = startTime
        self.__endTime = 0

    def getStartTime(self):
        return self.__startTime

    def getEndTime(self):
        return self.__endTime



    def stop(self):
        self.__endTime = time.time()



    def start(self):
        self.__startTime = time.time()

    def getElapsedTime(self):
        return self.__endTime - self.__startTime


count = 0
Timer = StopWatch()
for i in range(1, 1000001):
    count += i

Timer.stop()
total_Time = Timer.getElapsedTime()
print("Total time elapsed to count to 1,000,000: ",total_Time," milliseconds")

I calculated a surprisingly short time span. It was 0.20280098915100098 milliseconds. I first want to ask: Is this correct?

I expected execution to be at least 2 or 3 milliseconds, but I did not anticipate it would be able to make that computation in less than a half of a millisecond!

If this is correct, that leads me to my secondary question: WHY is it so fast?

I know CPUs are essentially built for arithmetic, but I still wouldn't anticipate it being able to count to one million in two tenths of a millisecond!

Answer 1

Maybe you were tricked by time measure unit, as @jonrsharpe commented.

Nevertheless, a 3rd generation Intel i7 is capable of 120+GIPS (ie billions of elementary operations per second), so assuming all cache hits and no context switch (put simply, no unexpected waits), it could easily count from 0 to 1G in said time and even more. Probably not with Python, since it has some overhead, but still possible.

Explaining how a modern CPU can achieve such an... "insane" speed is quite a broad subject, actually the collaboration of more than one technology:

a dynamic scheduler will rearrange elementary instructions to reduce conflicts (thus, waits) as much as possible
a well-engineered cache will promptly provide code and (although less problematic for this benchmark) data.
a dynamic branch predictor will profile code and speculate on branch conditions (eg "for loop is over or not?") to anticipate jumps with a high chance of "winning".
a good compiler will provide some additional effort by rearranging instructions in order to reduce conflicts or making loops faster (by unrolling, merging, etc.)
multi-precision arithmetic could exploit vectorial operations provided by the MMX set and similar.

In short, there is more than a reason why those small wonders are so expensive :)

Answer 2

First, as has been pointed out, time() output is actually in seconds, not milliseconds.

Also, you are actually performing 1m additions to a total of 1m**2 /2, not counting to 1m, and you are initializing a million-long list (unless you are on python 3) with range .

I ran a simpler test on my laptop:

start = time.time()
i = 0;
while i < 1000000:
   i+=1
print time.time() - start

Result:

0.069179093451

So, 70 milliseconds. That translates to 14 million operations per second.

Let's look at the table that Stefano probably referred to ( http://en.wikipedia.org/wiki/Instructions_per_second ) and do a rough estimation. They don't have an i5 like I do, but the slowest i7 will be close enough. It clocks 80 GIPS with 4 cores, 20 GIPS per core.

(By the way, if your question is "how does it manage to get 20 GIPS per core?", can't help you. It's ~~maaaagic~~ nanotechnology)

So the core is capable of 20 billion operations per second, and we get only 14 million - different by a factor of 1400 .

At this point the right question is not "why so fast?", by " why so slow? ". Probably python overhead. What if we try this in C?

#include <stdio.h>
#include <unistd.h>
#include <time.h>

int i = 0;
int million = 1000000;
int main() {

    clock_t cstart = clock();
    while (i < million) {
     i += 1;
    }

    clock_t cend = clock();
    printf ("%.3f cpu sec\n", ((double)cend - (double)cstart) / CLOCKS_PER_SEC);
    return 0;
}

Result:

0.003 cpu sec

This is 23 times faster than python, and only 60 times different from the number of theoretical 'elementary operations' per second. I see two operations here - comparison and addition, so 30 times different. This is entirely reasonable, as elementary operations are probably much smaller than our addition and comparison (let assembler experts tell us), and also we didn't factor in context switches, cache misses, time calculation overhead and who knows what else.

This also suggests that python performs 23 times as much operations to do the same thing. This is also entirely reasonable, because python is a high-level language. This is the kind of penalty you get in high level languages - and now you understand why speed-critical sections are usually written in C.

Also, python's integers are immutable, and memory should be allocated for each new integer (python runtime is smart about it, but nevertheless).

I hope that answers your question and teaches you a little bit about how to perform incredibly rough estimations =)

Answer 3

Short answer: As jonrsharpe mentioned in the comments, it's seconds, not milliseconds.

Also as Stefano said, xxxxxx --> check his posted answer. It has a lot of detail beyond just the ALU .

I'm just writing to mention - when you make default values in your classes or functions, make sure to use simple immutable instead of putting a function call or something like that. Your class is actually setting the start time of the timer for all instances - you will get a nasty surprise if you create a new Timer because it will use the previous value as the initial value. Try this and the timer does not get reset for the second Timer

#...
count = 0
Timer = StopWatch()
time.sleep(1)
Timer - StopWatch()
for i in range(1, 1000001):
    count += i
Timer.stop()
total_Time = Timer.getElapsedTime()
print("Total time elapsed to count to 1,000,000: ",total_Time," milliseconds")

You will get about 1 second instead of what you expect.

Recording time of execution: How do computers calculate arithemetic so fast?

Question

3 answers

solution1
3 2014-02-28 12:07:59

solution2
2 ACCPTED 2014-02-28 13:03:51

solution3
1 2014-02-28 12:09:41

Recording time of execution: How do computers calculate arithemetic so fast?

Question

3 answers

solution1 3 2014-02-28 12:07:59

solution2 2 ACCPTED 2014-02-28 13:03:51

solution3 1 2014-02-28 12:09:41

solution1
3 2014-02-28 12:07:59

solution2
2 ACCPTED 2014-02-28 13:03:51

solution3
1 2014-02-28 12:09:41