Why is all() slower than using for-else & break?

Question

I've been fooling around with problem 7 from project Euler and I noticed that two of my prime finding methods are very similar but run at very different speeds.

#!/usr/bin/env python3

import timeit

def lazySieve (num_primes):
    if num_primes == 0: return []
    primes = [2]
    test = 3
    while len(primes) < num_primes:
        sqrt_test = sqrt(test)
        if all(test % p != 0 for p in primes[1:]):  # I figured this would be faster
            primes.append(test)
        test += 2
    return primes

def betterLazySieve (num_primes):
    if num_primes == 0: return []
    primes = [2]
    test = 3
    while len(primes) < num_primes:
        for p in primes[1:]: # and this would be slower
            if test % p == 0: break
        else:
            primes.append(test)
        test += 2
    return primes

if __name__ == "__main__":

    ls_time  = timeit.repeat("lazySieve(10001)",
                             setup="from __main__ import lazySieve",
                             repeat=10,
                             number=1)
    bls_time = timeit.repeat("betterLazySieve(10001)",
                             setup="from __main__ import betterLazySieve",
                             repeat=10,
                             number=1)

    print("lazySieve runtime:       {}".format(min(ls_time)))
    print("betterLazySieve runtime: {}".format(min(bls_time)))

This runs with the following output:

lazySieve runtime:       4.931611961917952
betterLazySieve runtime: 3.7906006319681183

And unlike this question, I don't simply want the returned value of any/all.

Is the return from all() so slow that if overrides it's usage in all the but most niche of cases? Is the for-else break somehow faster than the short circuited all()?

What do you think?

Edit: Added in square root loop termination check suggested by Reblochon Masque

Update: ShadowRanger's answer was correct.

After changing

all(test % p != 0 for p in primes[1:])

to

all(map(test.__mod__, primes[1:]))

I recorded the following decrease in runtime:

lazySieve runtime:       3.5917471940629184
betterLazySieve runtime: 3.7998314710566774

Edit: Removed Reblochon's speed up to keep the question clear. Sorry man.

Answer 1

I may be wrong, but I think that every time it evaluates test % p != 0 in the generator expression, it's doing so in a new stack frame, so there's a similar overhead to calling a function. You can see evidence of the stack frame in tracebacks, for example:

>>> all(n/n for n in [0])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <genexpr>
ZeroDivisionError: integer division or modulo by zero

Answer 2

It's a combination of a few issues:

Calling builtin functions and loading and executing the generator code object is semi-expensive to set up, so for small numbers of primes to test, the setup costs drown out the per test costs
Generator expressions establish an inner scope; variables not being iterated over need to go through normal LEGB lookup costs, so every iteration inside all 's generator expression needs to look up test to make sure it hasn't changed, and it does so via a dict lookup (where local variable lookup is a cheap lookup in a fixed size array)
Generators have a small amount of overhead, particularly when jumping in and out of Python byte code ( all is implemented at the C layer in CPython)

Things you can do to minimize the difference or eliminate it:

Run on larger iterables for the test (to minimize effect of setup costs)
Explicitly pull test into the local scope of the generator, eg as a silly hack all(test % p != 0 for test in (test,) for p in primes[1:])
Remove all bytecode execution from the process by using map with C builtins, eg all(map(test.__mod__, primes[1:])) (which also happens to achieve #2, by looking up test.__mod__ once up front, rather than once per loop)

With a large enough input, #3 can sometimes win over your original code, at least on Python 3.5 (where I microbenchmarked in ipython), depending on a host of factors. It doesn't always win because there are some optimizations in the bytecode interpreter for BINARY_MODULO for values that can fit in a CPU register that explicitly skipping straight to the int.__mod__ code bypasses, but it usually performs quite similarly.

Answer 3

That is an interesting question on a puzzling result, for which I unfortunately don't have a definite answer... Maybe it is because of sample size, or particulars of this calculation? But like you, I found it surprising.

However, it is possible to make lazysieve faster than betterlazysieve :

def lazySieve (num_primes):
    if num_primes == 0: 
        return []
    primes = [2]
    test = 3
    while len(primes) < num_primes:
        if all(test % p for p in primes[1:] if p <= sqr_test):
            primes.append(test)
        test += 2
        sqr_test = test ** 0.5
    return primes

It runs in about 65 % of the time of your version, and is about 15% faster than betterlazysieve on my system.

using %%timit in jupyter notebook w python 3.4.4 on an oldish macbook air:

%%timeit 
lazySieve(10001)
# 1 loop, best of 3: 8.19 s per loop

%%timeit
betterLazySieve(10001)
# 1 loop, best of 3: 10.2 s per loop

Why is all() slower than using for-else & break?

Question

3 answers

solution1
1 2016-04-05 15:38:01

solution2
1 ACCPTED 2016-04-05 16:12:26

solution3
0 2016-04-05 15:38:54

Why is all() slower than using for-else & break?

Question

3 answers

solution1 1 2016-04-05 15:38:01

solution2 1 ACCPTED 2016-04-05 16:12:26

solution3 0 2016-04-05 15:38:54

solution1
1 2016-04-05 15:38:01

solution2
1 ACCPTED 2016-04-05 16:12:26

solution3
0 2016-04-05 15:38:54