Output Queue of a Python multiprocessing is providing more results than expected

Question

From the following code I would expect that the length of the resulting list were the same as the one of the range of items with which the multiprocess is feed:

import multiprocessing as mp

def worker(working_queue, output_queue):
    while True:
        if working_queue.empty() is True:
            break #this is supposed to end the process.
        else:
            picked = working_queue.get()
            if picked % 2 == 0: 
                output_queue.put(picked)
            else:
                working_queue.put(picked+1)
    return

if __name__ == '__main__':
    static_input = xrange(100)    
    working_q = mp.Queue()
    output_q = mp.Queue()
    for i in static_input:
        working_q.put(i)
    processes = [mp.Process(target=worker,args=(working_q, output_q)) for i in range(mp.cpu_count())]
    for proc in processes:
        proc.start()
    for proc in processes:
        proc.join()
    results_bank = []
    while True:
        if output_q.empty() is True:
            break
        else:
            results_bank.append(output_q.get())
    print len(results_bank) # length of this list should be equal to static_input, which is the range used to populate the input queue. In other words, this tells whether all the items placed for processing were actually processed.
    results_bank.sort()
    print results_bank

Has anyone any idea about how to make this code to run properly?

Answer 1

This code will never stop:

Each worker gets an item from the queue as long as it is not empty:

picked = working_queue.get()

and puts a new one for each that it got:

working_queue.put(picked+1)

As a result the queue will never be empty except when the timing between the process happens to be such that the queue is empty at the moment one of the processes calls empty() . Because the queue length is initially 100 and you have as many processes as cpu_count() I would be surprised if this ever stops on any realistic system.

Well executing the code with slight modification proves me wrong, it does stop at some point, which actually surprises me. Executing the code with one process there seems to be a bug, because after some time the process freezes but does not return. With multiple processes the result is varying.

Adding a short sleep period in the loop iteration makes the code behave as I expected and explained above. There seems to be some timing issue between Queue.put , Queue.get and Queue.empty , although they are supposed to be thread-safe. Removing the empty test also gives the expected result (without ever getting stuck at an empty queue).

Found the reason for the varying behaviour. The objects put on the queue are not flushed immediately. Therefore empty might return False although there are items in the queue waiting to be flushed.

From the documentation :

Note : When an object is put on a queue, the object is pickled and a background thread later flushes the pickled data to an underlying pipe. This has some consequences which are a little surprising, but should not cause any practical difficulties – if they really bother you then you can instead use a queue created with a manager.

After putting an object on an empty queue there may be an infinitesimal delay before the queue's empty() method returns False and get_nowait() can return without raising Queue.Empty.

If multiple processes are enqueuing objects, it is possible for the objects to be received at the other end out-of-order. However, objects enqueued by the same process will always be in the expected order with respect to each other.

Output Queue of a Python multiprocessing is providing more results than expected

Question

1 answers

solution1
1 ACCPTED 2014-02-07 19:13:40

Output Queue of a Python multiprocessing is providing more results than expected

Question

1 answers

solution1 1 ACCPTED 2014-02-07 19:13:40

solution1
1 ACCPTED 2014-02-07 19:13:40