简体   繁体   English

导入模块的多处理不起作用

[英]Multiprocessing from imported module doesn't work

I have a problem with multiprocessing when importing a module, which can be represented with the following example.我在导入模块时遇到了多处理的问题,可以用下面的例子来表示。

I have a module named tmp1.py with the following content:我有一个名为tmp1.py的模块,其内容如下:

from multiprocessing import Pool

def mul2(x):
    return [a*2 for a in x]

def calculate(func, x):
    r = []
    if __name__ == '__main__':
        p = Pool(2)
        r.append(p.map(func, x))
        p.close(), p.join()
    return r

I import it to another file tmp2.py with the following content:我将它导入到另一个文件tmp2.py中,内容如下:

import tmp1

inputs = [[1, 2, 3],
          [4, 5, 6],
          [7, 8, 9]]    

result = tmp1.calculate(tmp1.mul2, inputs)
print(result)

What I expect after running it, are values of the input multiplied by 2. However, after I run it all I get is an empty list.运行它后我期望的是输入的值乘以 2。但是,在我运行它之后,我得到的只是一个空列表。 I work on Windows, python 3.10.我在 Windows,python 3.10 上工作。

If I copy-paste all the content of tmp1.py to tmp2.py and adapt the function calls it works fine.如果我将tmp1.py的所有内容复制粘贴到tmp2.py并调整函数调用,它就可以正常工作。

Where's the problem?问题出在哪里?

if __name__ == '__main__': is required to control code that creates new processes on platforms such as Windows that use the spawn method for creating new processes. if __name__ == '__main__':是控制在使用spawn方法创建新进程的平台(如 Windows)上创建新进程的代码所必需的。

When the spawn method is being used the child process is created by creating an "empty" process, ie one that inherits nothing from the main process, into which a new Python interpreter is launched.当使用spawn方法时,子进程是通过创建一个“空”进程来创建的,即一个不从主进程继承任何内容的进程,一个新的 Python 解释器被启动到该进程中。 This new interpreter must re-read all the source files already referenced in order to execute all statements at global scope in order to re-create global variables, function definitions, etc. in the new child process's address space before invoking, in your case, function tmp1.mul2 .这个新的解释器必须重新读取所有已经引用的源文件,以便在全局范围内执行所有语句,以便在调用之前在新子进程的地址空间中重新创建全局变量、函数定义等,在您的情况下,函数tmp1.mul2 In the new process __name__ will not be '__main__' and therefore we will not be re-executing recursively the code that created this new process to begin with.在新进程中, __name__不会'__main__' ,因此我们不会递归地重新执行创建这个新进程的代码。

The code that creates new processes at global scope , which is the scope we need to concern ourselves with, is actually in the main script, specifically result = tmp1.calculate(tmp1.mul2, inputs) .在全局范围内创建新进程的代码,也就是我们需要关注的范围,实际上是在主脚本中,特别是result = tmp1.calculate(tmp1.mul2, inputs) It is this code that needs to be within the if __name__ == '__main__': block:正是这段代码需要在if __name__ == '__main__':块中:

File tmp2.py文件tmp2.py

if __name__ == '__main__':
    import tmp1

    inputs = [[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]]

    result = tmp1.calculate(tmp1.mul2, inputs)
    print(result)

I have put within the block all global statements that do not need to be executed within the new child process in order for it to have its storage initialized with whatever it might need for execution, ie all of the tmp2.py code.我已将不需要在新子进程中执行的所有全局语句放入块中,以便使用它可能需要执行的任何内容(即所有tmp2.py代码)对其存储进行初始化。

Then tmp1.py becomes:然后tmp1.py变为:

def mul2(x):
    return [a*2 for a in x]

def calculate(func, x):
    from multiprocessing import Pool

    r = []
    p = Pool(2)
    r.append(p.map(func, x))
    p.close()
    p.join()
    return r

The above prints:以上打印:

[[[2, 4, 6], [8, 10, 12], [14, 16, 18]]]

This is a list of a single list.这是一个列表的单个列表。 Perhaps, you really want:也许,你真的想要:

def mul2(x):
    return [a*2 for a in x]

def calculate(func, x):
    from multiprocessing import Pool

    p = Pool(2)
    r = p.map(func, x)
    p.close()
    p.join()
    return r

The prints:印刷品:

[[2, 4, 6], [8, 10, 12], [14, 16, 18]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM