I've been fooling around with problem 7 from project Euler and I noticed that two of my prime finding methods are very similar but run at very different speeds.
#!/usr/bin/env python3
import timeit
def lazySieve (num_primes):
if num_primes == 0: return []
primes = [2]
test = 3
while len(primes) < num_primes:
sqrt_test = sqrt(test)
if all(test % p != 0 for p in primes[1:]): # I figured this would be faster
primes.append(test)
test += 2
return primes
def betterLazySieve (num_primes):
if num_primes == 0: return []
primes = [2]
test = 3
while len(primes) < num_primes:
for p in primes[1:]: # and this would be slower
if test % p == 0: break
else:
primes.append(test)
test += 2
return primes
if __name__ == "__main__":
ls_time = timeit.repeat("lazySieve(10001)",
setup="from __main__ import lazySieve",
repeat=10,
number=1)
bls_time = timeit.repeat("betterLazySieve(10001)",
setup="from __main__ import betterLazySieve",
repeat=10,
number=1)
print("lazySieve runtime: {}".format(min(ls_time)))
print("betterLazySieve runtime: {}".format(min(bls_time)))
This runs with the following output:
lazySieve runtime: 4.931611961917952
betterLazySieve runtime: 3.7906006319681183
And unlike this question, I don't simply want the returned value of any/all.
Is the return from all()
so slow that if overrides it's usage in all the but most niche of cases? Is the for-else
break somehow faster than the short circuited all()?
What do you think?
Edit: Added in square root loop termination check suggested by Reblochon Masque
Update: ShadowRanger's answer was correct.
After changing
all(test % p != 0 for p in primes[1:])
to
all(map(test.__mod__, primes[1:]))
I recorded the following decrease in runtime:
lazySieve runtime: 3.5917471940629184
betterLazySieve runtime: 3.7998314710566774
Edit: Removed Reblochon's speed up to keep the question clear. Sorry man.
I may be wrong, but I think that every time it evaluates test % p != 0
in the generator expression, it's doing so in a new stack frame, so there's a similar overhead to calling a function. You can see evidence of the stack frame in tracebacks, for example:
>>> all(n/n for n in [0])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <genexpr>
ZeroDivisionError: integer division or modulo by zero
It's a combination of a few issues:
all
's generator expression needs to look up test
to make sure it hasn't changed, and it does so via a dict
lookup (where local variable lookup is a cheap lookup in a fixed size array) all
is implemented at the C layer in CPython) Things you can do to minimize the difference or eliminate it:
test
into the local scope of the generator, eg as a silly hack all(test % p != 0 for test in (test,) for p in primes[1:])
map
with C builtins, eg all(map(test.__mod__, primes[1:]))
(which also happens to achieve #2, by looking up test.__mod__
once up front, rather than once per loop) With a large enough input, #3 can sometimes win over your original code, at least on Python 3.5 (where I microbenchmarked in ipython), depending on a host of factors. It doesn't always win because there are some optimizations in the bytecode interpreter for BINARY_MODULO
for values that can fit in a CPU register that explicitly skipping straight to the int.__mod__
code bypasses, but it usually performs quite similarly.
That is an interesting question on a puzzling result, for which I unfortunately don't have a definite answer... Maybe it is because of sample size, or particulars of this calculation? But like you, I found it surprising.
However, it is possible to make lazysieve
faster than betterlazysieve
:
def lazySieve (num_primes):
if num_primes == 0:
return []
primes = [2]
test = 3
while len(primes) < num_primes:
if all(test % p for p in primes[1:] if p <= sqr_test):
primes.append(test)
test += 2
sqr_test = test ** 0.5
return primes
It runs in about 65 % of the time of your version, and is about 15% faster than betterlazysieve
on my system.
using %%timit
in jupyter notebook w python 3.4.4 on an oldish macbook air:
%%timeit
lazySieve(10001)
# 1 loop, best of 3: 8.19 s per loop
%%timeit
betterLazySieve(10001)
# 1 loop, best of 3: 10.2 s per loop
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.