简体   繁体   English

多处理池中的全局变量

[英]Global Variable in Multiprocessing Pool

I know this issue has been discussed here before, but I just cannot find any method that works.我知道这个问题之前已经在这里讨论过,但我找不到任何有效的方法。 I want to share a global variable between my multiprocessing processes without any of the processes changing it, ie they just need read access.我想在我的多处理进程之间共享一个全局变量,而不需要任何进程更改它,即它们只需要读取访问权限。 As a simple example, take:作为一个简单的例子,采取:

    def f(x):
        return x**GlobalVar

    if __name__ == '__main__':
        GlobalVar = 6
        pool = multiprocessing.Pool()
        res= pool.map(f,[1,2,3,4])
        print(res)

Now this obviously doesn't work as GlobalVar will not be accessible by the processes.现在这显然不起作用,因为进程无法访问 GlobalVar。 So for it to work I would gave to evaluate GlobalVar, or import the it from a file, in each separate process.因此,为了让它工作,我会在每个单独的过程中评估 GlobalVar,或从文件中导入它。 As in my application GlobalVar is a very large array, this is extremely wasteful.由于在我的应用程序中 GlobalVar 是一个非常大的数组,这非常浪费。 How can I easily share this Global Variable between the processes while just storing one copy of it in memory?如何在进程之间轻松共享此全局变量,同时将其副本存储在 memory 中? I want to reiterate that the processes only need to read this global variable without changing it.我想重申,进程只需要读取这个全局变量而不改变它。

Very simple way is to pass it as an argument to the f which gets executed in each process.非常简单的方法是将它作为参数传递给在每个进程中执行的f But if the global variable is too huge and you don't want to have a copy of it in every process and you only intend to perform the read operation then you can use shared memory.但是如果全局变量太大并且您不想在每个进程中都有它的副本并且您只想执行读取操作,那么您可以使用共享 memory。

Sample (Documented inline)示例(内联文档)

from multiprocessing import Pool
from multiprocessing import shared_memory
import numpy as np
def f(x):
    # Attach to the existing shared memory
    existing_shm = shared_memory.SharedMemory(name='abc123')
    # Read from the shared memory (we know the size is 1)
    c = np.ndarray((1,), dtype=np.int64, buffer=existing_shm.buf)
    return x*c[0]

if __name__ == '__main__':
    a = np.array([6])
    # Creates shared memory with name abc123
    shm = shared_memory.SharedMemory(create=True, size=a.nbytes, name="abc123")
    # Create numpy array backed by shared memory
    b = np.ndarray(a.shape, dtype=a.dtype, buffer=shm.buf)
    # copy the data into shared memory
    b[:] = a[:]
    with Pool(5) as p:
        print(p.map(f, [1, 2, 3]))

Output: Output:

[6, 12, 18]

Find Official docs here . 在此处查找官方文档。

Since the variable you wish to share is read-only and a "simple" integer, you just need to make it visible to your sub-processes in your multiprocessing pool by declaring it at global scope:由于您希望共享的变量是只读的并且是“简单”integer,因此您只需通过在全局 scope 中声明它来使其对多处理池中的子进程可见:

import multiprocessing

GlobalVar = 6

def f(x):
    return x**GlobalVar

if __name__ == '__main__':
    pool = multiprocessing.Pool()
    res= pool.map(f,[1,2,3,4])
    print(res)

Prints:印刷:

[1, 64, 729, 4096]

Discussion讨论

It is always relevant when discussing Python and multiprocessing which platform you are running on and I have updated your tags to add Windows (although) the code as written now will work on Linux also.在讨论 Python 和多处理您正在运行的平台时总是相关的,我已经更新了您的标签以添加Windows (尽管)现在编写的代码也适用于 ZEDC9F0A5A5D57797BF68E373647483。

On Windows when a new process is created (or processes when creating a pool of processes), spawn is used.在 Windows 上,当创建新进程(或创建进程池时的进程)时,使用spawn This means that the new processes do not inherit the variables that had been established by the main process but instead a new Python interpreter is launched for each new process and execution is started from the top of the program.这意味着新进程不会继承主进程建立的变量,而是为每个新进程启动一个新的 Python 解释器,并从程序顶部开始执行。 This is why you must enclose the code that launches new processes within a if __name__ == '__main__': block or else you would get into a recursive loop.这就是为什么您必须将启动新进程的代码包含在if __name__ == '__main__':块中,否则您将进入递归循环。 But for that reason, you must move the declaration of GlobalVar to global scope or else that variable will not be defined for the newly created processes.但出于这个原因,您必须将GlobalVar的声明移动到全局 scope 否则将不会为新创建的进程定义该变量。

The other way of initializing global variables for each sub-process within the pool is with a pool initializer function, which enables you to do more elaborate things than this demonstrates:为池中的每个子进程初始化全局变量的另一种方法是使用池初始化程序 function,它使您能够做比这更复杂的事情:

import multiprocessing

def init_pool(the_int):
    global GlobalVar
    GlobalVar = the_int

def f(x):
    return x**GlobalVar

if __name__ == '__main__':
    GlobalVar = 6
    pool = multiprocessing.Pool(initializer=init_pool, initargs=(GlobalVar,))
    res= pool.map(f,[1,2,3,4])
    print(res)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM