简体   繁体   中英

Python generator and set(generator) get different results

I have code like below

def yield_multiple():
    for prime in prime_list:
        for multiple in range(prime+prime, end, prime):
            yield multiple

And I use this to get the prime numbers

multiple_set = set(yield_multiple())
result = [v for v in candidate_list if v not in multiple_set]

And I meet the memory error when the set is very large, so I was thinking to use this to save the memory

result = [v for v in candidate_list if v not in yield_multiple()]

But this will get the wrong result. So, How to avoid memory error to get the prime numbers correctly?

Here's my improved solution without too much memory to use.

import math
import sys

import time
from mpi4py import MPI

import eratosthenes

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

TAG_RESULT = 0

n = sys.argv[1]
if n.isdigit():
    start_time = time.time()
    n = int(n)
    sqrt_n = int(math.sqrt(n))

    task_per_block = int(math.ceil((n - 1) / size))
    begin = 2 + rank * task_per_block
    end = begin + task_per_block if begin + task_per_block <= n + 1 else n + 1
    if rank == 0:
        begin = sqrt_n if sqrt_n < end else begin
    sieve_list = [True] * (end - begin)
    prime_list = eratosthenes.sieve(sqrt_n)

    if rank == 0:
        result = sum(prime_list)
        for prime in prime_list:
            start = begin if begin % prime == 0 else (int(begin / prime) + 1) * prime
            for multiple in range(start, end, prime):
                sieve_list[multiple - begin] = False
        result += sum(i + begin for i, v in enumerate(sieve_list) if v)
        result_received = 0
        while result_received < size - 1:
            data = comm.recv(source=MPI.ANY_SOURCE, tag=TAG_RESULT)
            result += data
            result_received += 1
        print(result)
        print(time.time() - start_time)
    else:
        for prime in prime_list:
            start = begin if begin % prime == 0 else (int(begin / prime) + 1) * prime
            for multiple in range(start, end, prime):
                sieve_list[multiple - begin] = False
        result = sum(i + begin for i, v in enumerate(sieve_list) if v)
        comm.send(result, dest=0, tag=TAG_RESULT)

By switching to working by segments between squares of consecutive primes, creating these sets for each segment one after another.

For each segment you'll have to calculate the starting point of the enumeration of a prime's multiples, for each known prime which is not greater than the segment's top value (ie the next "core" prime's square).

The "core" primes, to get the squares of, you can get separately, independently, by a recursive application of the same algorithm.

An example of this approach (the separate primes supply that is) is How to implement an efficient infinite generator of prime numbers in Python?

To make it parallel , you'll need to find means to use the set in a shared fashion between all the enumerations, which each will set each of its enumerated multiples off in the same shared set. Order of operations is not important, as long as they are all finished. The access need not be guarded, as setting the same location off twice (or more) is perfectly fine.

This will also be very efficient.

If you want to stay with this approach - which does have a certain simplicity, although it must be terribly inefficient - the simplest way I can see to do it without constructing a large set or re-running yield_multiple for each candidate is to sort of reverse your membership check:

multiples = {c for c in yield_multiple() if c in candidate_list}
result = [c for c in candidate_list if c not in multiples]

However, unless using your own code is the most important factor here, I'd recommend finding a more efficient approach, like for example the one described in this other answer .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM