I recently came across a requirement that I have a .fit()
trained scikit-learn
SVC
Classifier instance and need to .predict()
lots of instances.
Is there a way to parallelise only this .predict()
method by any scikit-learn
built-in tools?
from sklearn import svm
data_train = [[0,2,3],[1,2,3],[4,2,3]]
targets_train = [0,1,0]
clf = svm.SVC(kernel='rbf', degree=3, C=10, gamma=0.3, probability=True)
clf.fit(data_train, targets_train)
# this can be very large (~ a million records)
to_be_predicted = [[1,3,4]]
clf.predict(to_be_predicted)
If somebody does know a solution, I will be more than happy if you could share it.
This may be buggy, but something like this should do the trick. Basically, break your data into blocks and run your model on each block separately in a joblib.Parallel
loop.
from sklearn.externals.joblib import Parallel, delayed
n_cores = 2
n_samples = to_be_predicted.shape[0]
slices = [
(n_samples*i/n_cores, n_samples*(i+1)/n_cores))
for i in range(n_cores)
]
results = np.vstack( Parallel( n_jobs = n_cores )(
delayed(clf.predict)( to_be_predicted[slices[i_core][0]:slices[i_core][1]
for i_core in range(n_cores)
))
Working example from above...
from joblib import Parallel, delayed
from sklearn import svm
data_train = [[0,2,3],[1,2,3],[4,2,3]]
targets_train = [0,1,0]
clf = svm.SVC(kernel='rbf', degree=3, C=10, gamma=0.3, probability=True)
clf.fit(data_train, targets_train)
to_be_predicted = np.array([[1,3,4], [1,3,4], [1,3,5]])
clf.predict(to_be_predicted)
n_cores = 3
parallel = Parallel(n_jobs=n_cores)
results = parallel(delayed(clf.predict)(to_be_predicted[i].reshape(-1,3))
for i in range(n_cores))
np.vstack(results).flatten()
array([1, 1, 0])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.