I have code like below
def yield_multiple():
for prime in prime_list:
for multiple in range(prime+prime, end, prime):
yield multiple
And I use this to get the prime numbers
multiple_set = set(yield_multiple())
result = [v for v in candidate_list if v not in multiple_set]
And I meet the memory error when the set is very large, so I was thinking to use this to save the memory
result = [v for v in candidate_list if v not in yield_multiple()]
But this will get the wrong result. So, How to avoid memory error to get the prime numbers correctly?
Here's my improved solution without too much memory to use.
import math
import sys
import time
from mpi4py import MPI
import eratosthenes
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
TAG_RESULT = 0
n = sys.argv[1]
if n.isdigit():
start_time = time.time()
n = int(n)
sqrt_n = int(math.sqrt(n))
task_per_block = int(math.ceil((n - 1) / size))
begin = 2 + rank * task_per_block
end = begin + task_per_block if begin + task_per_block <= n + 1 else n + 1
if rank == 0:
begin = sqrt_n if sqrt_n < end else begin
sieve_list = [True] * (end - begin)
prime_list = eratosthenes.sieve(sqrt_n)
if rank == 0:
result = sum(prime_list)
for prime in prime_list:
start = begin if begin % prime == 0 else (int(begin / prime) + 1) * prime
for multiple in range(start, end, prime):
sieve_list[multiple - begin] = False
result += sum(i + begin for i, v in enumerate(sieve_list) if v)
result_received = 0
while result_received < size - 1:
data = comm.recv(source=MPI.ANY_SOURCE, tag=TAG_RESULT)
result += data
result_received += 1
print(result)
print(time.time() - start_time)
else:
for prime in prime_list:
start = begin if begin % prime == 0 else (int(begin / prime) + 1) * prime
for multiple in range(start, end, prime):
sieve_list[multiple - begin] = False
result = sum(i + begin for i, v in enumerate(sieve_list) if v)
comm.send(result, dest=0, tag=TAG_RESULT)
By switching to working by segments between squares of consecutive primes, creating these sets for each segment one after another.
For each segment you'll have to calculate the starting point of the enumeration of a prime's multiples, for each known prime which is not greater than the segment's top value (ie the next "core" prime's square).
The "core" primes, to get the squares of, you can get separately, independently, by a recursive application of the same algorithm.
An example of this approach (the separate primes supply that is) is How to implement an efficient infinite generator of prime numbers in Python?
To make it parallel , you'll need to find means to use the set in a shared fashion between all the enumerations, which each will set each of its enumerated multiples off in the same shared set. Order of operations is not important, as long as they are all finished. The access need not be guarded, as setting the same location off twice (or more) is perfectly fine.
This will also be very efficient.
If you want to stay with this approach - which does have a certain simplicity, although it must be terribly inefficient - the simplest way I can see to do it without constructing a large set or re-running yield_multiple
for each candidate is to sort of reverse your membership check:
multiples = {c for c in yield_multiple() if c in candidate_list}
result = [c for c in candidate_list if c not in multiples]
However, unless using your own code is the most important factor here, I'd recommend finding a more efficient approach, like for example the one described in this other answer .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.