Faster Computing Time with Python and Sklearn

Question

I'm doing a Thesis on model assessment techniques for machine learning classification tasks, I'm using some sklearn models, because I can write generic code for the most part, as I have lots of different datasets. One part of Sklearns model output is predict_proba in which it probability estimates. For large datasets with lots of datapoints, to compute the predict_proba for each datapoint takes a long time. I loaded up htop and saw python only using a single core for the calculations, so I wrote out the following function:

from joblib import Parallel, delayed
import multiprocessing
num_cores = multiprocessing.cpu_count()

def makeprob(r,first,p2,firstm):
    reshaped_r = first[r].reshape(1,p2)           
    probo = clf.predict_proba(reshaped_r)  
    probo = probo.max()                    
    print('Currently at %(perc)s percent' % {'perc': (r/firstm)*100})    
    return probo

# using multiple cores to run the function 'makeprob'
results = Parallel(n_jobs=num_cores)(delayed(makeprob)(r,first,p2,firstm) for r in range(firstm))

Now I see with htop all cores being used, and the speed up is significant, but not near as fast as I would like, if anybody knows of a way to speed this up or point me in the right direction as to get faster computation gains in this scenario that would be great.

Answer 1

The loss of performance depends on three elements:

Your python program : make sure that the datasets are well optimized to not overused RAM (ie, make a subset with only the key variables that you need)
The python environnment: If you run Sk-learn in ipython (Jupyter) Notebook , 'Multiprocessing' will not run as fast as in a python script. See iPython for parallel computing . A python script will be faster.
Python library : Several Python libraries are natively designed to use all the resources of the computer. For example, with Tensorflow Tensorflow , the supported device types are CPU and GPU (and you can use several GPU).

Faster Computing Time with Python and Sklearn

Question

1 answers

solution1
1 ACCPTED 2017-02-02 13:15:25

Faster Computing Time with Python and Sklearn

Question

1 answers

solution1 1 ACCPTED 2017-02-02 13:15:25

solution1
1 ACCPTED 2017-02-02 13:15:25