多处理：将值并行分配给 N 维数组/矩阵

Question

我试图将 for-loop 转换为multiprocessing.Pool().map function。 在这里，我创建了一个空的csr_matrix并基于索引并行分配值。 但这没有按预期工作。 执行代码需要几分钟，但byte_bigram_matrix仍然是空的。

byte_bigram_matrix = csr_matrix((10868,66049))

def calculate_bigram(file):
    with open('byteFiles/'+file,"r") as byte_file:
        byte_bigram_matrix[files.index(file)] = csr_matrix(#someprocessing to calculate bigrams)


from multiprocessing import Pool

#Using multiprocessing to calculate bi-grams 
files = os.listdir('filesPath/')
p = Pool() #Using max cores as processors
p.map(calculate_bigram, files)
p.close()
p.join()

问题：

我们不能使用来自 Multiprocessing 的map function 并行索引 ND 数组/矩阵的值吗？ 或者如何使用多处理来完成这项任务？

Answer 1

首先文件是“filePath/”中文件名称的一维 python 列表
据我所知，问题在于calculate_bigram，因为您使用读取而不是写入打开文件，因此在尝试写入时会出错。 我试过这个：

def calculate_bigram(file):
    if os.path.isfile(file):
        with open(file, "w") as byte_file:
            byte_file.write("this is a test")

import os
from multiprocessing import Pool

if __name__ == "__main__":
    #Using multiprocessing to calculate bi-grams 
    files = os.listdir('files/')
    path = os.path.dirname(__file__)
    for idx, file in enumerate(files):
        files[idx] = os.path.join(path, "files", file)

    with Pool(processes=4) as pool:
        pool.map(calculate_bigram, files)

文件目录看起来像这样

files
|-> a.txt
|-> b.txt
|-> sub
     |-> c.txt

另外，您必须提供完整路径，而不是与您执行的文件相关的路径，因此

path = os.path.dirname(__file__)
for idx, file in enumerate(files):
    files[idx] = os.path.join(path, "files", file)

因为池更改了执行目录，所以文件最终会出现你不想要的东西

编辑：您的评论：\您仍然必须指定完整路径，而不是与当前执行相关的路径。 至少这对我来说是这样的

多处理：将值并行分配给 N 维数组/矩阵

问题描述

1 个解决方案

解决方案1
0 2021-01-05 23:06:07

多处理：将值并行分配给 N 维数组/矩阵

问题描述

1 个解决方案

解决方案1 0 2021-01-05 23:06:07

解决方案1
0 2021-01-05 23:06:07