Faster Computing Time with Python and Sklearn

I'm doing a Thesis on model assessment techniques for machine learning classification tasks, I'm using some sklearn models, because I can write generic code for the most part, as I have lots of different datasets. One part of Sklearns model output is predict_proba in which it probability estimates. For large datasets with lots of datapoints, to compute the predict_proba for each datapoint takes a long time. I loaded up htop and saw python only using a single core for the calculations, so I wrote out the following function:

from joblib import Parallel, delayed
import multiprocessing
num_cores = multiprocessing.cpu_count()

def makeprob(r,first,p2,firstm):
    reshaped_r = first[r].reshape(1,p2)           
    probo = clf.predict_proba(reshaped_r)  
    probo = probo.max()                    
    print('Currently at %(perc)s percent' % {'perc': (r/firstm)*100})    
    return probo

# using multiple cores to run the function 'makeprob'
results = Parallel(n_jobs=num_cores)(delayed(makeprob)(r,first,p2,firstm) for r in range(firstm)) 

Now I see with htop all cores being used, and the speed up is significant, but not near as fast as I would like, if anybody knows of a way to speed this up or point me in the right direction as to get faster computation gains in this scenario that would be great.

The loss of performance depends on three elements:

  1. Your python program : make sure that the datasets are well optimized to not overused RAM (ie, make a subset with only the key variables that you need)
  2. The python environnment: If you run Sk-learn in ipython (Jupyter) Notebook , 'Multiprocessing' will not run as fast as in a python script. See iPython for parallel computing . A python script will be faster.
  3. Python library : Several Python libraries are natively designed to use all the resources of the computer. For example, with Tensorflow Tensorflow , the supported device types are CPU and GPU (and you can use several GPU).

