python多处理中将多个可迭代项作为参数

Question

I have 3 dimensional dataset (100,64,3000) , and i am finding features using multiprocessing. 我有3维数据集(100,64,3000) ，我正在使用多处理功能。 I am doing multiprocessing across channel. 我正在跨通道进行多处理。 Such as each process cover 8 channels out of 64. Here is my code 例如每个进程覆盖64个通道中的8个。这是我的代码

import numpy as np
import time
from multiprocessing import Process,current_process,Pool

sub=1
def cal_feature(ch):
    data=np.load('data_{}.npy'.format(sub))
    return np.mean(data[:,ch:ch+8,:],-1)


# multiprocessing
if __name__ == '__main__':

    start = time.time()
    ch=[i for i in range(0,64,8)]
    with Pool(8) as p:
        result = p.map(cal_feature,(ch) )
    print(time.time()-start)

You can create dummy data this way. 您可以通过这种方式创建虚拟数据。

import numpy as np
np.save('data_1', np.random.randint(0, 100, size=(100, 64, 3000)))
np.save('data_2', np.random.randint(0, 100, size=(100, 64, 3000)))
np.save('data_3', np.random.randint(0, 100, size=(100, 64, 3000)))
np.save('data_4', np.random.randint(0, 100, size=(100, 64, 3000)))

In my code i have to define which data has to be picked manually sub=1 . 在我的代码中，我必须定义必须手动提取哪些数据sub=1 。 What I want to modify the above code such that it pick sub =1 and then find feature for all channels in a multiprocess way. 我想要修改上面的代码，以使其选择sub =1 ，然后以多进程方式查找所有通道的功能。 When its done it move to subject 2 and so on. 完成后，移至主题2，依此类推。

EDIT 编辑

ind_result=[result[i:i+8] for i in range(0,(len(sub)*8),8)]
for i,j in zip(sub,ind_result):
    np.save('subject_0_{}'.format(i),np.concatenate((j),1)   )

Answer 1

You're facing a common limitation of the multiprocessing , that is that pool.map only accepts one argument iterable. 您面临multiprocessing一个共同限制，那就是pool.map仅接受一个可迭代的参数。

You can work around that by packing ch and sub into a tuple, and build the argument iterable with itertools.product ( reference here ). 您可以通过将ch和sub打包到一个元组中来解决此问题，并使用itertools.product构建可迭代的参数（请参阅此处）。 You can then unpack the two arguments inside the cal_feature function. 然后，您可以在cal_feature函数中解压缩两个参数。

import numpy as np
import time
from multiprocessing import Pool
from itertools import product

def cal_feature(param):
    sub, ch = param
    data=np.load('data_{}.npy'.format(sub))
    return np.mean(data[:,ch:ch+8,:],-1)


# multiprocessing
if __name__ == '__main__':

    start = time.time()
    ch=[i for i in range(0,64,8)]
    sub = [1, 2, 3, 4]

    # here's the magic
    param_list = product(sub, ch)
    print list(param_list)
    # [(1, 0), (1, 8), (1, 16), (1, 24), (1, 32), (1, 40), (1, 48), 
    # (1, 56), (2, 0), (2, 8), (2, 16), (2, 24), (2, 32), (2, 40), 
    # (2, 48), (2, 56), (3, 0), (3, 8), (3, 16), (3, 24), (3, 32), 
    # (3, 40), (3, 48), (3, 56), (4, 0), (4, 8), (4, 16), (4, 24), 
    # (4, 32), (4, 40), (4, 48), (4, 56)]

    p = Pool(8)
    result = p.map(cal_feature,param_list )
    p.close()
    print(time.time()-start)
    # 0.0117809772491

Answer 2

There are some limitations for Pool , I tried some methods, and recommend this way: Pool有一些限制，我尝试了一些方法，并推荐这种方式：

from multiprocessing import Pool
from itertools import product
from functools import partial


def cal_feature(sub, ch):
    return sub, ch


ch = [i for i in range(0, 16, 8)]
sub_list = [1, 2, 3]


def pool_helper(f, args):
    return f(*args)


with Pool(8) as p:
    result = p.map(partial(pool_helper, cal_feature), product(sub_list, ch))

print(result)
# output is [(1, 0), (1, 8), (2, 0), (2, 8), (3, 0), (3, 8)]

We don't need change original cal_feature , and pool_helper can be used for any function which accepts positional params. 我们不需要更改原始的cal_feature ， pool_helper可以用于任何接受位置参数的函数。

python多处理中将多个可迭代项作为参数

问题描述

2 个解决方案

解决方案1
2 已采纳 2019-08-30 08:27:09

解决方案2
0 2019-08-30 10:42:08

python多处理中将多个可迭代项作为参数

问题描述

2 个解决方案

解决方案1 2 已采纳 2019-08-30 08:27:09

解决方案2 0 2019-08-30 10:42:08

解决方案1
2 已采纳 2019-08-30 08:27:09

解决方案2
0 2019-08-30 10:42:08