简体   繁体   English

使用 multiprocessing.pool.map 传递 kwargs

[英]passing kwargs with multiprocessing.pool.map

I would like to pass keyword arguments to my worker-function with Pool.map().我想使用 Pool.map() 将关键字 arguments 传递给我的工作函数。 I can't find a clear example of this when searching forums.在搜索论坛时,我找不到这方面的明确示例。

Example Code:示例代码:

import multiprocessing as mp

def worker((x,y), **kwargs):
    kwarg_test = kwargs.get('kwarg_test', False)
    print("kwarg_test = {}".format(kwarg_test))     
    if kwarg_test:
        print("Success")
    return x*y

def wrapper_process(**kwargs):
    jobs = []
    pool=mp.Pool(4)
    for i, n in enumerate(range(4)):
        jobs.append((n,i))
    pool.map(worker, jobs) #works
    pool.map(worker, jobs, kwargs) #how to do this?   

def main(**kwargs):
    worker((1,2),kwarg_test=True) #accepts kwargs
    wrapper_process(kwarg_test=True)

if __name__ == "__main__":    
    main()

Output: Output:

kwarg_test = True
Success
kwarg_test = False
kwarg_test = False
kwarg_test = False
kwarg_test = False
TypeError: unsupported operand type(s) for //: 'int' and 'dict'

The type error has to do with parsing arguments inside of multiprocessing.Pool or Queue, and I have tried several other syntaxes, like making a list of the kwargs;类型错误与在 multiprocessing.Pool 或 Queue 中解析 arguments 有关,我尝试了其他几种语法,比如列出 kwargs; [kwargs, kwargs, kwargs, kwargs], as well as several attempts to include the kwarg in the jobs list but no luck. [kwargs, kwargs, kwargs, kwargs],以及几次尝试将 kwarg 包含在工作列表中但没有成功。 I traced the code in multiprocessing.pool from map to map_async and got as far as task_batches = Pool._get_tasks(func, iterable, chunksize) in pool.py when I encountered the generator structure.当我遇到生成器结构时,我将 multiprocessing.pool 中的代码从 map 追踪到 map_async 并在 pool.py 中找到了task_batches = Pool._get_tasks(func, iterable, chunksize) I'm happy to learn more about this in future but for now I am just trying to find out:我很高兴将来能了解更多这方面的知识,但现在我只是想找出:

Is there a simple syntax for allowing the passing of kwargs with pool.map?是否有允许使用 pool.map 传递 kwargs 的简单语法?

If you want to iterate over the other arguments, use @ArcturusB's answer. 如果要迭代其他参数,请使用@ArcturusB的答案。

If you just want to pass them, having the same value for each iteration, then you can do this: 如果您只想传递它们,每次迭代具有相同的值,那么您可以这样做:

from functools import partial
pool.map(partial(worker, **kwargs), jobs)

Partial 'binds' arguments to a function. 部分 '绑定'函数的参数。 Old versions of Python cannot serialize partial objects though. 旧版本的Python 无法序列化部分对象。

The multiprocessing.pool.Pool.map doc states: multiprocessing.pool.Pool.map文档指出:

A parallel equivalent of the map() built-in function ( it supports only one iterable argument though ). map()内置函数的并行等价物( 它只支持一个可迭代的参数 )。 It blocks until the result is ready. 它会阻塞,直到结果准备就绪。

We can only pass one iterable argument. 我们只能传递一个可迭代的参数。 End of the story. 故事的结局。 But we can luckilly think of a workaround: define worker_wrapper function that takes a single argument, unpacks it to args and kwargs, and passes them to worker : 但是我们可以想到一个解决方法:定义一个带有单个参数的worker_wrapper函数,将它解压缩到args和kwargs,并将它们传递给worker

def worker_wrapper(arg):
    args, kwargs = arg
    return worker(*args, **kwargs)

In your wrapper_process , you need to construct this single argument from jobs (or even directly when constructing jobs) and call worker_wrapper : wrapper_process ,您需要从jobs构建这个单个参数(甚至在构造作业时直接构造)并调用worker_wrapper

arg = [(j, kwargs) for j in jobs]
pool.map(worker_wrapper, arg)

Here is a working implementation, kept as close as possible to your original code: 这是一个有效的实现,尽可能接近原始代码:

import multiprocessing as mp

def worker_wrapper(arg):
    args, kwargs = arg
    return worker(*args, **kwargs)

def worker(x, y, **kwargs):
    kwarg_test = kwargs.get('kwarg_test', False)
    # print("kwarg_test = {}".format(kwarg_test))     
    if kwarg_test:
        print("Success")
    else:
        print("Fail")
    return x*y

def wrapper_process(**kwargs):
    jobs = []
    pool=mp.Pool(4)
    for i, n in enumerate(range(4)):
        jobs.append((n,i))
    arg = [(j, kwargs) for j in jobs]
    pool.map(worker_wrapper, arg)

def main(**kwargs):
    print("=> calling `worker`")
    worker(1, 2,kwarg_test=True) #accepts kwargs
    print("=> no kwargs")
    wrapper_process() # no kwargs
    print("=> with `kwar_test=True`")
    wrapper_process(kwarg_test=True)

if __name__ == "__main__":    
    main()

Which passes the test: 通过测试:

=> calling `worker`
Success
=> no kwargs
Fail
Fail
Fail
Fail
=> with `kwar_test=True`
Success
Success
Success
Success

You don't need to force yourself to use map .您无需强迫自己使用map Just use apply_async and pass in your parameters as a dictionary.只需使用apply_async并将参数作为字典传递。 In this example batch_parameters is a list of dictionaries which contain the parameters you want to pass.在此示例中, batch_parameters是一个字典列表,其中包含您要传递的参数。 future_parameters keeps a list of tuples of futures and the parameters used to get those futures. future_parameters保留了一个 futures 元组列表和用于获取这些 futures 的参数。 In the loop that follows, we wait for the futures to get their results and print the results together with the parameters that were used to generate them.在接下来的循环中,我们等待期货获得结果并将结果与用于生成它们的参数一起打印出来。

with Pool(parallelism) as pool:
    future_parameters = [(pool.apply_async(f, kwds=parameters), parameters) for parameters in batch_parameters]
    for future, parameters in future_parameters:
        result = future.get()
        print(parameters, "=>", result)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM