简体   繁体   English

python 多处理 map function

[英]python multiprocessing map function

I encountered a problem while writing the python code with a multiprocessing map function. The minimum code to reproduce the problem is like我在使用多处理 map function 编写 python 代码时遇到问题。重现该问题的最少代码如下

import multiprocessing as mp

if __name__ == '__main__':

    def f(x):
        return x*x

    num_workers = 2
    with mp.Pool(num_workers) as p:
        print(p.map(f, [1,2,3]))

If one runs this piece of code, I got the error message如果运行这段代码,我会收到错误消息

AttributeError: Can't get attribute 'f' on <module '__mp_main__' from 'main.py'>

However, If I move f-function outside the main function, ie但是,如果我将 f 函数移到主 function 之外,即

import multiprocessing as mp

def f(x):
    return x*x

if __name__ == '__main__':

    num_workers = 2
    with mp.Pool(num_workers) as p:
        print(p.map(f, [1,2,3]))

It works this time.这次成功了。 I am wondering what's the difference between them and how can I get an error in the first version.我想知道它们之间有什么区别以及如何在第一个版本中出现错误。 Thanks in advance.提前致谢。

This will vary between operating systems, but the basic reason is that this line of code这将因操作系统而异,但基本原因是这行代码

if __name__ == '__main__':

is telling the Python interpreter to only include anything in this code section in the main process when run as a script - it won't be included in any sub process, nor will it appear if you import it as a module.告诉 Python 解释器在作为脚本运行时仅在主进程中包含此代码部分中的任何内容 - 它不会包含在任何子进程中,如果将其作为模块导入也不会出现。 So when you do this所以当你这样做时

import multiprocessing as mp

if __name__ == '__main__':

    def f(x):
        return x*x

    num_workers = 2
    with mp.Pool(num_workers) as p:
        print(p.map(f, [1,2,3]))

any sub processes created by p.map will not have the definition of function f由 p.map 创建的任何子进程都不会具有 function f 的定义

Depending on your operating system, sub-processes will either be forked or spawned.根据您的操作系统,子进程将被分叉或生成。 macOS, for example, will spawn whereas Windows will fork.例如,macOS 会产生,而 Windows 会产生分叉。

You can enforce forking but you need to fully understand the implications of doing so.您可以强制执行分叉,但您需要充分理解这样做的含义。

For this specific question a workaround could be implemented thus:对于这个特定问题,可以实施解决方法:

import multiprocessing as mp
from multiprocessing import set_start_method

if __name__ == '__main__':
    def f(x):
        return x*x
    set_start_method('fork')
    num_workers = 2
    with mp.Pool(num_workers) as p:
        print(p.map(f, [1,2,3]))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM