Very large execution time differences for virtually same C++ and Python code

Question

I was trying to write a solution for Problem 12 (Project Euler) in Python. The solution was just too slow, so I tried checking up other people's solution on the internet. I found this code written in C++ which does virtually the same exact thing as my python code, with just a few insignificant differences.

Python:

def find_number_of_divisiors(n):
    if n == 1:
        return 1

    div = 2 # 1 and the number itself
    for i in range(2, n/2 + 1):
        if (n % i) == 0:
            div += 1
    return div

def tri_nums():
    n = 1
    t = 1
    while 1:
        yield t
        n += 1
        t += n

t = tri_nums()
m = 0
for n in t:
    d = find_number_of_divisiors(n)
    if m < d:
        print n, ' has ', d, ' divisors.'
        m = d

    if m == 320:
        exit(0)

C++:

#include <iostream>

int main(int argc, char *argv[])
{
    unsigned int iteration = 1;
    unsigned int triangle_number = 0;
    unsigned int divisor_count = 0;
    unsigned int current_max_divisor_count = 0;
    while (true) {
        triangle_number += iteration;
        divisor_count = 0;
        for (int x = 2; x <= triangle_number / 2; x ++) {
            if (triangle_number % x == 0) {
                divisor_count++;
            }
        }
        if (divisor_count > current_max_divisor_count) {
            current_max_divisor_count = divisor_count;
            std::cout << triangle_number << " has " << divisor_count
                      << " divisors." << std::endl;
        }
        if (divisor_count == 318) {
            exit(0);
        }

        iteration++;
    }
    return 0;
}

The python code takes 1 minute and 25.83 seconds on my machine to execute. While the C++ code takes around 4.628 seconds. Its like 18x faster. I had expected the C++ code to be faster but not by this great margin and that too just for a simple solution which consists of just 2 loops and a bunch of increments and mods.

Although I would appreciate answers on how to solve this problem, the main question I want to ask is Why is C++ code so much faster? Am I using/doing something wrongly in python?

Replacing range with xrange:

After replacing range with xrange the python code takes around 1 minute 11.48 seconds to execute. (Around 1.2x faster)

Answer 1

This is exactly the kind of code where C++ is going to shine compared to Python: a single fairly tight loop doing arithmetic ops. (I'm going to ignore algorithmic speedups here, because your C++ code uses the same algorithm, and it seems you're explicitly not asking for that...)

C++ compiles this kind of code down to a relatively few number of instructions for the processor (and everything it does probably all fits in the super-fast levels of CPU cache), while Python has a lot of levels of indirection it's going through for each operation. For example, every time you increase a number it's checking that the number didn't just overflow and need to be moved into a bigger data type.

That said, all is not necessarily lost! This is also the kind of code that a just-in-time compiler system like PyPy will do well at, since once it's gone through the loop a few times it compiles the code to something similar to what the C++ code starts at. On my laptop:

$ time python2.7 euler.py >/dev/null
python euler.py  72.23s user 0.10s system 97% cpu 1:13.86 total

$ time pypy euler.py >/dev/null                       
pypy euler.py > /dev/null  13.21s user 0.03s system 99% cpu 13.251 total

$ clang++ -o euler euler.cpp && time ./euler >/dev/null
./euler > /dev/null  2.71s user 0.00s system 99% cpu 2.717 total

using the version of the Python code with xrange instead of range . Optimization levels don't make a difference for me with the C++ code, and neither does using GCC instead of Clang.

While we're at it, this is also a case where Cython can do very well, which compiles almost-Python code to C code that uses the Python APIs, but uses raw C when possible. If we change your code just a little bit by adding some type declarations, and removing the iterator since I don't know how to handle those efficiently in Cython, getting

cdef int find_number_of_divisiors(int n):
    cdef int i, div
    if n == 1:
        return 1

    div = 2 # 1 and the number itself
    for i in xrange(2, n/2 + 1):
        if (n % i) == 0:
            div += 1
    return div

cdef int m, n, t, d
m = 0
n = 1
t = 1
while True:
    n += 1
    t += n
    d = find_number_of_divisiors(t)
    if m < d:
        print n, ' has ', d, ' divisors.'
        m = d

    if m == 320:
        exit(0)

then on my laptop I get

$ time python -c 'import euler_cy' >/dev/null
python -c 'import euler_cy' > /dev/null  4.82s user 0.02s system 98% cpu 4.941 total

(within a factor of 2 of the C++ code).

Answer 2

Rewriting the divisor counting algorithm to use divisor function makes the run time reduces to less than 1 second. It is still possible to make it faster, but not really necessary.

This is to show that: before you do any optimization trick with the language features and compiler, you should check whether your algorithm is the bottleneck or not. The trick with compiler/interpreter is indeed quite powerful, as shown in Dougal's answer where the gap between Python and C++ is closed for the equivalent code. However, as you can see, the change in algorithm immediately give a huge performance boost and lower the run time to around the level of algorithmically inefficient C++ code (I didn't test the C++ version, but on my 6-year-old computer, the code below finishes running in ~0.6s).

The code below is written and tested with Python 3.2.3.

import math

def find_number_of_divisiors(n):
    if n == 1:
        return 1

    num = 1

    count = 1
    div = 2
    while (n % div == 0):
        n //= div
        count += 1

    num *= count

    div = 3
    while (div <= pow(n, 0.5)):
        count = 1
        while n % div == 0:
            n //= div
            count += 1

        num *= count
        div += 2

    if n > 1:
        num *= 2

    return num

Answer 3

Here's my own variant built on nhahtdh's factor-counting optimization plus my own prime factorization code:

def prime_factors(x):
    def factor_this(x, factor):
        factors = []
        while x % factor == 0:
            x /= factor
            factors.append(factor)
        return x, factors
    x, factors = factor_this(x, 2)
    x, f = factor_this(x, 3)
    factors += f
    i = 5
    while i * i <= x:
        for j in (2, 4):
            x, f = factor_this(x, i)
            factors += f
            i += j
    if x > 1:
        factors.append(x)
    return factors

def product(series):
    from operator import mul
    return reduce(mul, series, 1)

def factor_count(n):
    from collections import Counter
    c = Counter(prime_factors(n))
    return product([cc + 1 for cc in c.values()])

def tri_nums():
    n, t = 1, 1
    while 1:
        yield t
        n += 1
        t += n

if __name__ == '__main__':
    m = 0
    for n in tri_nums():
        d = factor_count(n)
        if m < d:
            print n, ' has ', d, ' divisors.'
            m = d
            if m == 320:
                break

Very large execution time differences for virtually same C++ and Python code

Question

3 answers

solution1
8 ACCPTED 2013-03-16 04:44:23

solution2
4 2013-03-16 05:46:27

solution3
1

Very large execution time differences for virtually same C++ and Python code

Question

3 answers

solution1 8 ACCPTED 2013-03-16 04:44:23

solution2 4 2013-03-16 05:46:27

solution3 1

solution1
8 ACCPTED 2013-03-16 04:44:23

solution2
4 2013-03-16 05:46:27

solution3
1