I'm using the sklearn.grid_search.RandomizedSearchCV class from scikit-learn 14.1, and I get an error when running the following code:
X, y = load_svmlight_file(inputfile)
min_max_scaler = preprocessing.MinMaxScaler()
X_scaled = min_max_scaler.fit_transform(X.toarray())
parameters = {'kernel':'rbf', 'C':scipy.stats.expon(scale=100), 'gamma':scipy.stats.expon(scale=.1)}
svr = svm.SVC()
classifier = grid_search.RandomizedSearchCV(svr, parameters, n_jobs=8)
classifier.fit(X_scaled, y)
When I set the n_jobs parameter to more than 1, I get the following error output:
Traceback (most recent call last):
File "./svm_training.py", line 185, in <module>
main(sys.argv[1:])
File "./svm_training.py", line 63, in main
gridsearch(inputfile, kerneltype, parameterfile)
File "./svm_training.py", line 85, in gridsearch
classifier.fit(X_scaled, y)
File "/usr/local/lib/python2.7/dist-packages/scikit_learn-0.14.1-py2.7-linux- x86_64.egg/sklearn/grid_search.py", line 860, in fit
return self._fit(X, y, sampled_params)
File "/usr/local/lib/python2.7/dist-packages/scikit_learn-0.14.1-py2.7-linux-x86_64.egg/sklearn/grid_search.py", line 493, in _fit
for parameters in parameter_iterable
File "/usr/local/lib/python2.7/dist-packages/scikit_learn-0.14.1-py2.7-linux-x86_64.egg/sklearn/externals/joblib/parallel.py", line 519, in __call__
self.retrieve()
File "/usr/local/lib/python2.7/dist-packages/scikit_learn-0.14.1-py2.7-linux-x86_64.egg/sklearn/externals/joblib/parallel.py", line 419, in retrieve
self._output.append(job.get())
File "/usr/lib/python2.7/multiprocessing/pool.py", line 558, in get
raise self._value
SystemError: NULL result without error in PyObject_Call
It seems to have something to do with the python multiprocessing functionality, but I'm not sure how to work around it other than just implement the parallelization for the parameter search by hand. Has anyone had a similar issue with trying to parallelize the randomized parameter search in that they were able to solve?
It turns out the problem was with the use of MinMaxScaler. Since MinMaxScaler only accepts dense arrays, I was translating the sparse representation of the feature vector to a dense array before scaling. Since the feature vector has thousands of elements, my assumption is that the dense arrays caused a memory error when trying to parallelize the parameter search. Instead, I switched to StandardScaler, which accepts sparse arrays as input, and should be better for use with my problem space anyway.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.