简体   繁体   中英

How to tell if an apply_async function has started or if it's still in the queue with multiprocessing.Pool

I'm using python's multiprocessing.Pool and apply_async to call a bunch of functions.

How can I tell whether a function has started processing by a member of the pool or whether it is sitting in a queue?

For example:

import multiprocessing
import time

def func(t):
    #take some time processing
    print 'func({}) started'.format(t)
    time.sleep(t)

pool = multiprocessing.Pool()

results = [pool.apply_async(func, [t]) for t in [100]*50] #adds 50 func calls to the queue

For each AsyncResult in results you can call ready() or get(0) to see if the func finished running. But how do you find out whether the func started but hasn't finished yet?

ie for a given AsyncResult object (ie a given element of results) is there a way to see whether the function has been called or if it's sitting in the pool's queue?

First, remove completed jobs from results list

    results = [r for r in results if not r.ready()]

Number of processes pending is length of results list:

    pending = len(results)

And number pending but not started is total pending - pool_size

    not_started = pending - pool_size

pool_size will be multiprocessing.cpu_count() if Pool is created with default argument as you did

UPDATE : After initially misunderstanding the question, here's a way to do what OP was asking about.

I suspect this functionality could be added to the Pool class without too much trouble because AsyncResult is implemented by Pool with a Queue. That queue could also be used internally to indicate whether started or not.

But here's a way to implement using Pool and Pipe. NOTE: this doesn't work in Python 2.x -- not sure why. Tested in Python 3.8.

import multiprocessing
import time
import os

def worker_function(pipe):
    pipe.send('started')
    print('[{}] started pipe={}'.format(os.getpid(), pipe))
    time.sleep(3)
    pipe.close()

def test():
    pool = multiprocessing.Pool(processes=2)
    print('[{}] pool={}'.format(os.getpid(), pool))

    workers = []

    for x in range(1, 4):
        parent, child = multiprocessing.Pipe()
        pool.apply_async(worker_function, (child,))
        worker = {'name': 'worker{}'.format(x), 'pipe': parent, 'started': False}
        workers.append(worker)

    pool.close()

    while True:
        for worker in workers:
            if worker.get('started'):
                continue
            pipe = worker.get('pipe')
            if pipe.poll(0.1):
                message = pipe.recv()
                print('[{}] {} says {}'.format(os.getpid(), worker.get('name'), message))
                worker['started'] = True
                pipe.close()
        count_in_queue = len(workers)
        for worker in workers:
            if worker.get('started'):
                count_in_queue -= 1
        print('[{}] count_in_queue = {}'.format(os.getpid(), count_in_queue))
        if not count_in_queue:
            break
        time.sleep(0.5)

    pool.join()

if __name__ == '__main__':
    test()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM