繁体   English   中英

尝试在scikit-learn中并行化参数搜索会导致“ SystemError:PyObject_Call中没有错误的NULL结果”

[英]Trying to parallelize parameter search in scikit-learn leads to “SystemError: NULL result without error in PyObject_Call”

我正在使用scikit-learn 14.1中的sklearn.grid_search.RandomizedSearchCV类,并且在运行以下代码时出现错误:

X, y = load_svmlight_file(inputfile)

min_max_scaler = preprocessing.MinMaxScaler()
X_scaled = min_max_scaler.fit_transform(X.toarray())

parameters = {'kernel':'rbf', 'C':scipy.stats.expon(scale=100), 'gamma':scipy.stats.expon(scale=.1)}

svr = svm.SVC()

classifier = grid_search.RandomizedSearchCV(svr, parameters, n_jobs=8)
classifier.fit(X_scaled, y)

当我将n_jobs参数设置为大于1时,得到以下错误输出:

Traceback (most recent call last):
  File "./svm_training.py", line 185, in <module>
    main(sys.argv[1:])
  File "./svm_training.py", line 63, in main
    gridsearch(inputfile, kerneltype, parameterfile)
  File "./svm_training.py", line 85, in gridsearch
    classifier.fit(X_scaled, y)
  File "/usr/local/lib/python2.7/dist-packages/scikit_learn-0.14.1-py2.7-linux-    x86_64.egg/sklearn/grid_search.py", line 860, in fit
    return self._fit(X, y, sampled_params)
  File "/usr/local/lib/python2.7/dist-packages/scikit_learn-0.14.1-py2.7-linux-x86_64.egg/sklearn/grid_search.py", line 493, in _fit
    for parameters in parameter_iterable
  File "/usr/local/lib/python2.7/dist-packages/scikit_learn-0.14.1-py2.7-linux-x86_64.egg/sklearn/externals/joblib/parallel.py", line 519, in __call__
    self.retrieve()
  File "/usr/local/lib/python2.7/dist-packages/scikit_learn-0.14.1-py2.7-linux-x86_64.egg/sklearn/externals/joblib/parallel.py", line 419, in retrieve
    self._output.append(job.get())
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 558, in get
    raise self._value
SystemError: NULL result without error in PyObject_Call

它似乎与python多处理功能有关,但是我不确定如何解决它,而不仅仅是手动实现参数搜索的并行化。 有人试图解决随机参数搜索问题时遇到类似的问题吗?

事实证明,问题出在使用MinMaxScaler。 由于MinMaxScaler仅接受密集数组,因此我在缩放之前将特征向量的稀疏表示转换为密集数组。 由于特征向量具有数千个元素,因此我的假设是,密集数组在尝试并行化参数搜索时会导致存储错误。 取而代之的是,我切换到StandardScaler,它接受稀疏数组作为输入,并且无论如何应该更好地用于问题空间。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM