简体   繁体   中英

Multiprocessing: Assign values to N-dimensional array/matrix parallelly

I was trying to convert for-loop to multiprocessing.Pool().map function. Here, I have created an empty csr_matrix and assigning values based on an index parallelly. But this not working as expected. It is taking a couple of minutes to execute the code, but byte_bigram_matrix is still empty.

byte_bigram_matrix = csr_matrix((10868,66049))

def calculate_bigram(file):
    with open('byteFiles/'+file,"r") as byte_file:
        byte_bigram_matrix[files.index(file)] = csr_matrix(#someprocessing to calculate bigrams)


from multiprocessing import Pool

#Using multiprocessing to calculate bi-grams 
files = os.listdir('filesPath/')
p = Pool() #Using max cores as processors
p.map(calculate_bigram, files)
p.close()
p.join()

Question:

Can't we index values of ND array/matrix parallelly using map function from Multiprocessing? or how to do this task using multiprocessing?

firstly files is a 1 dimensional python list of the names of the files in "filePath/"
from what I can tell the problem lies in calculate_bigram as you are opening a file using read rather than write, therefore you will get an error when trying to write to it. I tried this:

def calculate_bigram(file):
    if os.path.isfile(file):
        with open(file, "w") as byte_file:
            byte_file.write("this is a test")

import os
from multiprocessing import Pool

if __name__ == "__main__":
    #Using multiprocessing to calculate bi-grams 
    files = os.listdir('files/')
    path = os.path.dirname(__file__)
    for idx, file in enumerate(files):
        files[idx] = os.path.join(path, "files", file)

    with Pool(processes=4) as pool:
        pool.map(calculate_bigram, files)

and the files dir looks like this

files
|-> a.txt
|-> b.txt
|-> sub
     |-> c.txt

additionaly you have to suply the full path not the path in relation to the file your executing hence the

path = os.path.dirname(__file__)
for idx, file in enumerate(files):
    files[idx] = os.path.join(path, "files", file)

because pool changes the execution direcory so the files end up someware you dont want it

Edit: to your comment:\you still have to specify the full path and not the path in relation to the current execution. at least that's how it works for me

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM