简体   繁体   English

多处理池的 Jupyter 笔记本问题

[英]Jupyter notebook issues with Multiprocessing Pool

I'm trying to apply Multiprocessing in my code and I ran into this example :我试图在我的代码中应用Multiprocessing ,我遇到了这个例子

import multiprocessing
from itertools import product

def merge_names(a, b):
    return '{} & {}'.format(a, b)

if __name__ == '__main__':
    names = ['Brown', 'Wilson', 'Bartlett', 'Rivera', 'Molloy', 'Opie']
    with multiprocessing.Pool(processes=3) as pool:
        results = pool.starmap(merge_names, product(names, repeat=2))
    print(results)

This should take no more than a few seconds, but when I ran it in Jupyter Notebook it does not end, I have to reset the kernel for that.这应该不会超过几秒钟,但是当我在 Jupyter Notebook 中运行它时,它并没有结束,我必须为此重置内核。 Any special issues with Jupyter or Anaconda in using Multiprocessing ? Jupyter 或 Anaconda 在使用Multiprocessing时有什么特殊问题吗?

I'm using我在用着

conda version 4.8.4
ipython version 5.8.0

This not really an answer but since comments cannot nicely format code, I'll put it here Your code does not work for me even in pure python 3.8 (installed through conda though) - I do not think it is connected to the jupyter or ipython.这不是一个真正的答案,但由于注释不能很好地格式化代码,我会把它放在这里即使在纯 python 3.8(虽然通过 conda 安装)中,您的代码也不适用于我 - 我认为它没有连接到 jupyter 或 ipython .

This code works for me:这段代码对我有用:

import multiprocessing
from itertools import product

names = ['Brown', 'Wilson', 'Bartlett', 'Rivera', 'Molloy', 'Opie']
with multiprocessing.Pool(processes=3) as pool:
    results = pool.starmap('{} & {}'.format, product(names, repeat=2))
print(results)

thus is seems that there is some issue with pickling the custom function and sending it to the pool - I do not know the cause nor the solution for that.因此,腌制自定义函数并将其发送到池中似乎存在一些问题 - 我不知道原因,也不知道解决方案。

But if you just need similar functionality, I recommend joblib但是如果你只需要类似的功能,我推荐joblib

from joblib import Parallel, delayed
from itertools import product

def merge_names(a, b):
    return '{} & {}'.format(a, b)

names = ['Brown', 'Wilson', 'Bartlett', 'Rivera', 'Molloy', 'Opie']
result = Parallel(n_jobs=3, prefer="processes")(delayed(merge_names)(a, b) for a,b in product(names, repeat=2))
print(result)

The joblib has similar construction of pool of workers, which then can be used similarly as you want: joblib 具有类似的工人池构造,然后可以根据需要类似地使用:

with Parallel(n_jobs=2) as parallel:
   ...

I've noticed this behaviour too.我也注意到了这种行为。 Everything seems to be working but it never finishes.一切似乎都在工作,但它永远不会完成。 Wrapping the variables in a tqdm progress bar shows the variables being loaded in but then nothing.将变量包装在 tqdm 进度条中会显示正在加载的变量,但随后什么也没有。 It never ever finishes and Task Manager shows CPUs doing absolutely zero work.它永远不会完成,任务管理器显示 CPU 的工作量绝对为零。

Taking the function and putting it into a separate Python file then importing the function back in seems to work for me.获取该函数并将其放入一个单独的 Python 文件中,然后重新导入该函数似乎对我有用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM