I was trying to convert for-loop to multiprocessing.Pool().map
function. Here, I have created an empty csr_matrix
and assigning values based on an index parallelly. But this not working as expected. It is taking a couple of minutes to execute the code, but byte_bigram_matrix
is still empty.
byte_bigram_matrix = csr_matrix((10868,66049))
def calculate_bigram(file):
with open('byteFiles/'+file,"r") as byte_file:
byte_bigram_matrix[files.index(file)] = csr_matrix(#someprocessing to calculate bigrams)
from multiprocessing import Pool
#Using multiprocessing to calculate bi-grams
files = os.listdir('filesPath/')
p = Pool() #Using max cores as processors
p.map(calculate_bigram, files)
p.close()
p.join()
Question:
Can't we index values of ND array/matrix parallelly using map
function from Multiprocessing? or how to do this task using multiprocessing?
firstly files is a 1 dimensional python list of the names of the files in "filePath/"
from what I can tell the problem lies in calculate_bigram as you are opening a file using read rather than write, therefore you will get an error when trying to write to it. I tried this:
def calculate_bigram(file):
if os.path.isfile(file):
with open(file, "w") as byte_file:
byte_file.write("this is a test")
import os
from multiprocessing import Pool
if __name__ == "__main__":
#Using multiprocessing to calculate bi-grams
files = os.listdir('files/')
path = os.path.dirname(__file__)
for idx, file in enumerate(files):
files[idx] = os.path.join(path, "files", file)
with Pool(processes=4) as pool:
pool.map(calculate_bigram, files)
and the files dir looks like this
files
|-> a.txt
|-> b.txt
|-> sub
|-> c.txt
additionaly you have to suply the full path not the path in relation to the file your executing hence the
path = os.path.dirname(__file__)
for idx, file in enumerate(files):
files[idx] = os.path.join(path, "files", file)
because pool changes the execution direcory so the files end up someware you dont want it
Edit: to your comment:\you still have to specify the full path and not the path in relation to the current execution. at least that's how it works for me
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.