简体   繁体   English

python multiprocessing.Pool类中的全局变量

[英]Global variables in python's multiprocessing.Pool class

I have a relatively simply parallelizable problem that is causing way too much problems for me to implement effectively. 我有一个相对简单的可并行化问题,导致太多问题无法有效实施。 What is at the core of my program are two matrices and two vectors, one matrix and vector for each of my two calculations I wish to perform. 我程序的核心是两个矩阵和两个向量,对于我要执行的两个计算中的每一个,一个矩阵和一个向量。

In code, that means I have 在代码中,这意味着我有

import numpy as np
matrices = dict([("type1", np.random.rand(10,10)), ("type2", np.random.rand(10,10))])
vectors = dict([("type1", np.random.rand(10)), ("type2", np.random.rand(10))])

What I want to do (not really, but in the simplified case) is this: 我想做的(不是真的,但在简化的情况下)是这样的:

I have a very large list of vectors for each type: 对于每种类型,我都有大量的向量:

input_vectors = [np.random.rand(10) for i in range(1000)]

and I want to calculate A*v+b where A is the matrix and b is the vector for each type. 我想计算A*v+b ,其中A是矩阵, b是每种类型的向量。

The single-thread code that does what I need is therefore 因此,执行我需要的单线程代码是

def f(input_vector, matricx, vector):
    return np.dot(matrix, input_vector) + vector

results = {}
for type in ['type1', 'type2']:
    results[type] = []
    for input_vector in input_vectors:
        results.append(f(input_vector, matrices[type], vectors[type]))

However, I want to do this in parallel. 但是,我想并行执行此操作。 However, I do not know how to solve the problem that the function that I want to map over the list of vectors takes as its input more than just the vectors. 但是,我不知道如何解决我要映射到向量列表上的函数将其输入当作不仅仅是向量的问题。

I want to write something like 我想写点东西

from multiprocessing import Pool
p = Pool(4)
for type in types:
    p.map(lambda x: f(x, matrices[type], vectors[type] , input_vectors))

However, that does not work because the lambda function cannot be pickled. 但是,这不起作用,因为无法腌制lambda函数。 One thing that does work is to append the matrix I want to multiply with to each vector, but that of course is not feasible memory-wise. 起作用的一件事是将要与之相乘的矩阵附加到每个向量,但是从内存角度来看,这当然是不可行的。

Any thoughts on how to elegantly solve my conundrum? 关于如何优雅地解决我的难题的任何想法?


What I would like is for each element of the pool to have a copy of the matrix and vector it has to multiply with, but I do not know how to do this in multiprocessing . 我想要的是池中的每个元素都有一个矩阵和向量的副本,它必须与之相乘,但是我不知道如何在multiprocessing做到这一点。

Use functools.partial to pass multiple arguments to map : 使用functools.partial将多个参数传递给map

def f(matrix, vector, input_vector):
    return np.dot(matrix, input_vector) + vector

results = {}
for type_ in types:
    func = partial(f, matrices[type_], vectors[type_])
    results[type_] = p.map(func, input_vectors)

If you prefer to pass the entire matrices and vectors lists to each child when you start up the Pool , and then just pass the type when you call map , you can do that, too. 如果您希望在启动Pool时将整个matricesvectors列表传递给每个孩子,然后在调用map时仅传递type ,您也可以这样做。 Use the initializer / initargs arguments to multiprocessing.Pool to pass the lists, and then make them global inside the initializer function. 使用initializer / initargs参数进行multiprocessing.Pool initargs传递列表,然后将它们全局设置为initializer函数内部。 That will make them global inside each child process: 这将使它们在每个子进程中都具有全局性:

matrices = vectors = None

def init(_matrices, _vectors):
    global matrices, vectors
    matrices = _matrices
    vectors = _vectors


def f(type_, input_vector):
    return np.dot(matrices[type_], input_vector) + vectors[type_]

def main():
    # <declare matrices, vectors, input_vectors here>
    p = multiprocessing.Pool(initializer=init, 
                             initargs=(matrices, vectors))
    results = {}
    for type_ in ['type1', 'type2']:
        func = partial(f, type_)
        results[type_] = p.map(func, input_vectors)

if __name__ == "__main__":
    main()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM