简体   繁体   中英

Parallelization with multiprocessing, joblib or multiprocess is not working

There is this stackoverflow post that really nicely shows a way to calculate the proximity matrix of a RandomForestClassifier() .

Proximity Matrix in sklearn.ensemble.RandomForestClassifier

Nevertheless the for-loop in that script is quite slow if you have a large dataframe. I tried to parallelize this for-loop, but unsuccesfully. I only get 'None' as an output.

How can I parallelize this for-loop in Spyder 4 running Python 3.8.5 on Windows 10 ?

proxMat = 1*np.equal.outer(a, a)

for i in range(1, nTrees):
      a = terminals[:,i]
      proxMat += 1*np.equal.outer(a, a)

Here you want to perform a reduce operation - so parrallelization is not obvious. You did not specify how you tried to parallelize the loop. A simple way to parrallelize:

import multiprocessing
pool = multiprocessing.Pool(processes=4)

def get_outer(i):
   return np.equal.outer(terminals[:,i],terminals[:,i])

todo = list(range(1, nTrees))
results = pool.map(get_outer, todo)
proxMat = 1*np.equal.outer(a, a)
for res in results:
    proxMat+ = res

I'm not sure this one would help, but possibly you'd have less pickling problems:

import multiprocessing
pool = multiprocessing.Pool(processes=4)

def get_outer(t):
   return np.equal.outer(t,t)

# This part might be costly !
terms = [terminals[:,i] for i in range(1, nTrees)]

results = pool.map(get_outer, terms)
proxMat = 1*np.equal.outer(a, a)
for res in results:
    proxMat+ = res

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM