简体   繁体   English

在循环中等待 pool.apply_async

[英]Wait for pool.apply_async inside a loop

I am trying to implement multiprocessing in my python code for the first time.我第一次尝试在我的 python 代码中实现多处理。 I am stuck, since I cannot make async_apply wait for all its processes to finish.我被卡住了,因为我无法让 async_apply 等待其所有进程完成。 I'd like to process the elements in smaller chunks and save the results while I go through the long list of elements.我想以较小的块处理元素并保存结果,同时我通过长长的元素列表 go。

As a simpler example:作为一个更简单的例子:

import multiprocessing as mp

def fun(x, y):
    print("here")
    return(x+y)

buffer = []

for val in range(10):
    buffer.append(val)
    print(f{Added value: {val})
    if len(buffer) == 5:
        #It is my understanding, this is necessary on Windows
        if __name__ == "__main__":
            pool = mp.Pool()
            res = [pool.apply_async(fun, args = (x,x)) for x in buffer]
            res = [r.wait() for r in res]
            print(f'Results: {res}')
            buffer = []
            pool.close()
            pool.join()

I would love this to produce the following output:我希望它能够产生以下 output:

Added value: 0
Added value: 1
Added value: 2
Added value: 3
Added value: 4
Here
Here
Here
Here
Here
Results: [0, 2, 4, 6, 8]
Added value: 5
Added value: 6
Added value: 7
Added value: 8
Added value: 9
Here
Here
Here
Here
Here
Results: [10, 12, 14, 16, 18]

But it actually produces this (on my machine, at least):它实际上产生了这个(至少在我的机器上):

Added value: 0
Added value: 1
Added value: 2
Added value: 3
Added value: 4
Added value: 0
Added value: 1
Added value: 2
Added value: 3
Added value: 4
Added value: 5
Added value: 6
Added value: 7
Added value: 8
Added value: 9
Added value: 0
Added value: 1
Added value: 2
Added value: 3
Added value: 4
Added value: 5
Added value: 6
Added value: 7
Added value: 8
Added value: 9
Added value: 0
Added value: 1
Added value: 2
Added value: 3
Added value: 4
Added value: 5
Added value: 6
Added value: 7
Added value: 8
Added value: 9
Added value: 0
Added value: 1
Added value: 2
Added value: 3
Added value: 4
Added value: 5
Added value: 6
Added value: 7
Added value: 8
Added value: 9
Here
Here
Here
Here
Here
Results: [None, None, None, None, None]
Added value: 5
Added value: 6
Added value: 7
Added value: 8
Added value: 9
Added value: 0
Added value: 1
Added value: 2
Added value: 3
Added value: 4
Added value: 5
Added value: 6
Added value: 7
Added value: 8
Added value: 9
Added value: 0
Added value: 1
Added value: 2
Added value: 3
Added value: 4
Added value: 5
Added value: 6
Added value: 7
Added value: 8
Added value: 9
Added value: 0
Added value: 1
Added value: 2
Added value: 3
Added value: 4
Added value: 5
Added value: 6
Added value: 7
Added value: 8
Added value: 9
Added value: 0
Added value: 1
Added value: 2
Added value: 3
Added value: 4
Added value: 5
Added value: 6
Added value: 7
Added value: 8
Added value: 9
Here
Here
Here
Here
Here
Results: [None, None, None, None, None]

Any suggestion is really appreciated.任何建议都非常感谢。

Try putting the whole for loop in the conditional suite.尝试将整个 for 循环放在条件套件中。

...
if __name__ == '__main__':

    for val in range(10):
        buffer.append(val)
        print(f'Added value: {val}')
        if len(buffer) == 5:
            pool = mp.Pool()
            res = [pool.apply_async(fun, args = (x,x)) for x in buffer]
            # wait til they are ALL done ?
            for r in res:
                r.wait()
            # get the return values
            res = [r.get() for r in res]
            print(f'Results: {res}')
            buffer = []
            pool.close()
            pool.join()

Here is your original with some extra inspection.这是您的原件,经过一些额外检查。 I still don't know why but it appears that somehow the lines in the for loop are running in multiple python processes.我仍然不知道为什么,但似乎 for 循环中的行在多个 python 进程中运行。

import multiprocessing as mp
import pickle

def fun(x, y, pid=None):
    print(f"here pid:{pid}",file=sys.stderr)
    return (x+y,pid)

buffer = []
stuff = []

with open(r'c:\pyProjects\stuff.pkl','wb') as f:
    pickle.dump(stuff,f)

for val in range(10):
    buffer.append(val)
    pid = os.getpid()
    print(f'Added value: {val}.   pid={pid}')
    d = {'val':val,'pid':pid}
    with open(r'c:\pyProjects\stuff.pkl','rb') as f:
        try:
            stuff = pickle.load(f)
            stuff.append(d)
        except EOFError as e:
            s = '\n'.join(f'\t\t\t\t{item}' for item in stuff)
            print(f'\t\t\tEOFError {d}\n\t\t\tstuff:\n{s}\n')
    with open(r'c:\pyProjects\stuff.pkl','wb') as f:
        pickle.dump(stuff,f)
    if len(buffer) == 5:
        print(buffer)
        #It is my understanding, this is necessary on Windows
        if __name__ == "__main__":
            pool = mp.Pool()
            res = [pool.apply_async(fun, args = (x,x,pid)) for x in buffer]
            res = [r.get() for r in res]
            print(f'\t\t\tResults: {res}')
            buffer = []
            pool.close()
            pool.join()

After it finishes you can load and peruse the pickled file with完成后,您可以加载并仔细阅读腌制文件

>>> import pickle
>>> from pprint import pprint
>>> with open(r'c:\pyProjects\stuff.pkl','rb') as f:
...     a = pickle.load(f)

>>> a.sort(key=lambda x: x['pid'])
>>> pprint(a)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM