[英]How to understand multiprocessing.Queue when working with multiprocessing.Pool?
[英]how to treat var in second multiprocessing.Pool when the var is from first Pool
当 var 来自第一个池时,如何在第二个 multiprocessing.Pool 中处理 var?
对于示例代码
from multiprocessing import Pool
import pandas as pd
lst = [1, 2, 3]
def csv(code):
df = pd.DataFrame({code: [code, code**2, code**3]}, index=lst)
return {code: df}
def mp1():
with Pool(8) as pool:
rs = pool.map(csv, lst)
dfs = dict((key, val) for k in rs for key, val in k.items())
return dfs
def dosomthing(code):
dfs[code] = dfs[code] * code
return {code: dfs[code]}
def mp_dosomething():
with Pool(8) as pool:
rs = pool.map(dosomthing, lst)
dfc = dict((key, val) for k in rs for key, val in k.items())
return dfc
if __name__ == '__main__':
dfs = mp1()
dfc = mp_dosomething()
在if __name__ == '__main__':
从函数 mp1 之后,我可以很容易地得到 dfs。
但是当我想使用第二个池对 dfs 做一些事情时。
它会变得错误:
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Users\NeNe\AppData\Local\Programs\Python\Python310\lib\multiprocessing\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "C:\Users\NeNe\AppData\Local\Programs\Python\Python310\lib\multiprocessing\pool.py", line 48, in mapstar
return list(map(*args))
File "c:\Users\NeNe\OneDrive\Python\test.py", line 17, in dosomthing
dfs[code] = dfs[code] * code
NameError: name 'dfs' is not defined
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "c:\Users\NeNe\OneDrive\Python\test.py", line 28, in <module>
dfc = mp_dosomething()
File "c:\Users\NeNe\OneDrive\Python\test.py", line 22, in mp_dosomething
rs = pool.map(dosomthing, lst)
File "C:\Users\NeNe\AppData\Local\Programs\Python\Python310\lib\multiprocessing\pool.py", line 367, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:\Users\NeNe\AppData\Local\Programs\Python\Python310\lib\multiprocessing\pool.py", line 774, in get
raise self._value
NameError: name 'dfs' is not defined
我怎样才能得到dfc?
至少在 Windows 上,主进程的全局变量在工作进程中不可用。 池 class 支持在每个工作人员上调用初始化程序 function 以接收此类变量(如果他们的数据可以被 pickle)并设置它们。
在这里可以这样做:
def initializer(ext_dfs):
global dfs
dfs = ext_dfs
def mp_dosomething():
with Pool(8, initializer, (dfs,)) as pool:
# Do work
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.