繁体   English   中英

当 var 来自第一个 Pool 时,如何在第二个 multiprocessing.Pool 中处理 var

[英]how to treat var in second multiprocessing.Pool when the var is from first Pool

当 var 来自第一个池时,如何在第二个 multiprocessing.Pool 中处理 var?

对于示例代码

from multiprocessing import Pool
import pandas as pd 

lst = [1, 2, 3]

def csv(code):
    df = pd.DataFrame({code: [code, code**2, code**3]}, index=lst)
    return {code: df}

def mp1():
    with Pool(8) as pool:
        rs = pool.map(csv, lst)
        dfs = dict((key, val) for k in rs for key, val in k.items())
        return dfs 

def dosomthing(code):
    dfs[code] = dfs[code] * code
    return {code: dfs[code]}

def mp_dosomething():
    with Pool(8) as pool:
        rs = pool.map(dosomthing, lst)
        dfc = dict((key, val) for k in rs for key, val in k.items())
        return dfc

if __name__ == '__main__':
    dfs = mp1()
    dfc = mp_dosomething() 

if __name__ == '__main__':从函数 mp1 之后,我可以很容易地得到 dfs。

但是当我想使用第二个池对 dfs 做一些事情时。

它会变得错误:

multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\NeNe\AppData\Local\Programs\Python\Python310\lib\multiprocessing\pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "C:\Users\NeNe\AppData\Local\Programs\Python\Python310\lib\multiprocessing\pool.py", line 48, in mapstar
    return list(map(*args))
  File "c:\Users\NeNe\OneDrive\Python\test.py", line 17, in dosomthing
    dfs[code] = dfs[code] * code
NameError: name 'dfs' is not defined
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:\Users\NeNe\OneDrive\Python\test.py", line 28, in <module>
    dfc = mp_dosomething()
  File "c:\Users\NeNe\OneDrive\Python\test.py", line 22, in mp_dosomething
    rs = pool.map(dosomthing, lst)
  File "C:\Users\NeNe\AppData\Local\Programs\Python\Python310\lib\multiprocessing\pool.py", line 367, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "C:\Users\NeNe\AppData\Local\Programs\Python\Python310\lib\multiprocessing\pool.py", line 774, in get
    raise self._value
NameError: name 'dfs' is not defined

我怎样才能得到dfc?

至少在 Windows 上,主进程的全局变量在工作进程中不可用。 池 class 支持在每个工作人员上调用初始化程序 function 以接收此类变量(如果他们的数据可以被 pickle)并设置它们。

在这里可以这样做:

def initializer(ext_dfs):
    global dfs
    dfs = ext_dfs


def mp_dosomething():
    with Pool(8, initializer, (dfs,)) as pool:
        # Do work

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM