I'm doing a Thesis on model assessment techniques for machine learning classification tasks, I'm using some sklearn models, because I can write generic code for the most part, as I have lots of different datasets. One part of Sklearns model output is predict_proba
in which it probability estimates. For large datasets with lots of datapoints, to compute the predict_proba
for each datapoint takes a long time. I loaded up htop
and saw python only using a single core for the calculations, so I wrote out the following function:
from joblib import Parallel, delayed
import multiprocessing
num_cores = multiprocessing.cpu_count()
def makeprob(r,first,p2,firstm):
reshaped_r = first[r].reshape(1,p2)
probo = clf.predict_proba(reshaped_r)
probo = probo.max()
print('Currently at %(perc)s percent' % {'perc': (r/firstm)*100})
return probo
# using multiple cores to run the function 'makeprob'
results = Parallel(n_jobs=num_cores)(delayed(makeprob)(r,first,p2,firstm) for r in range(firstm))
Now I see with htop
all cores being used, and the speed up is significant, but not near as fast as I would like, if anybody knows of a way to speed this up or point me in the right direction as to get faster computation gains in this scenario that would be great.
The loss of performance depends on three elements:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.