is there a way to limit how much gets submitted to a Pool of workers?

Question

I have a Pool of workers and use apply_async to submit work to them. I do not care for the result of the function applied to each item. The pool seems to accept any number of apply_async calls, no matter how large the data or how quickly the workers can keep up with the work.

Is there a way to make apply_async block as soon as a certain number of items are waiting to be processed? I am sure internally, the pool is using a Queue, so it would be trivial to just use a maximum size for the Queue?

If this is not supported, would it make sense to submit a big report because this look like very basic functionality and rather trivial to add?

It would be a shame if one had to essentially re-implement the whole logic of Pool just to make this work.

Here is some very basic code:

from multiprocessing import Pool
dowork(item):
    # process the item (for side effects, no return value needed)
    pass 

pool = Pool(nprocesses)
for work in getmorework():
    # this should block if we already have too many work waiting!        
    pool.apply_async(dowork, (work,))
pool.close()
pool.join()

Answer 1

So something like this?

import multiprocessing
import time

worker_count = 4
mp = multiprocessing.Pool(processes=worker_count)
workers = [None] * worker_count

while True:
    try:
        for i in range(worker_count):
            if workers[i] is None or workers[i].ready():
                workers[i] = mp.apply_async(dowork, args=next(getmorework()))
    except StopIteration:
        break
    time.sleep(1)

I dunno how fast you're expecting each worker to finish, the time.sleep may or may not be necessary or might need to be a different time or whatever.

Answer 2

an alternative might be to use Queue 's directly:

from multiprocessing import Process, JoinableQueue
from time import sleep
from random import random

def do_work(i):
    print(f"worker {i}")
    sleep(random())
    print(f"done {i}")

def worker():
    while True:
        item = q.get()
        if item is None:
            break
        do_work(item)
        q.task_done()

def generator(n):
    for i in range(n):
        print(f"gen {i}")
        yield i

# 1 = allow generator to get this far ahead
q = JoinableQueue(1)

# 2 = maximum amount of parallelism
procs = [Process(target=worker) for _ in range(2)]
# and get them running
for p in procs:
    p.daemon = True
    p.start()

# schedule 10 items for processing
for item in generator(10):
    q.put(item)

# wait for jobs to finish executing
q.join()

# signal workers to finish up
for p in procs:
    q.put(None)
# wait for workers to actually finish
for p in procs:
    p.join()

mostly stolen from example Python's queue module:

https://docs.python.org/3/library/queue.html#queue.Queue.join

is there a way to limit how much gets submitted to a Pool of workers?

Question

2 answers

solution1
1 ACCPTED 2018-10-30 21:30:35

solution2
1 2018-11-03 12:00:02

is there a way to limit how much gets submitted to a Pool of workers?

Question

2 answers

solution1 1 ACCPTED 2018-10-30 21:30:35

solution2 1 2018-11-03 12:00:02

solution1
1 ACCPTED 2018-10-30 21:30:35

solution2
1 2018-11-03 12:00:02