![](/img/trans.png)
[英]NaN giving ValueError in OneHotEncoder in scikit-learn
[英]Scikit-learn GridSearch giving “ValueError: multiclass format is not supported” error
我正在嘗試使用 GridSearch 進行 LinearSVC() 的參數估計,如下所示 -
clf_SVM = LinearSVC()
params = {
'C': [0.5, 1.0, 1.5],
'tol': [1e-3, 1e-4, 1e-5],
'multi_class': ['ovr', 'crammer_singer'],
}
gs = GridSearchCV(clf_SVM, params, cv=5, scoring='roc_auc')
gs.fit(corpus1, y)
corpus1 具有形狀 (1726, 7001) 並且 y 具有形狀 (1726,)
這是一個多類分類,y 的值從 0 到 3,包括 0 到 3,即有四個類。
但這給了我以下錯誤-
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-220-0c627bda0543> in <module>()
5 }
6 gs = GridSearchCV(clf_SVM, params, cv=5, scoring='roc_auc')
----> 7 gs.fit(corpus1, y)
/usr/local/lib/python2.7/dist-packages/sklearn/grid_search.pyc in fit(self, X, y)
594
595 """
--> 596 return self._fit(X, y, ParameterGrid(self.param_grid))
597
598
/usr/local/lib/python2.7/dist-packages/sklearn/grid_search.pyc in _fit(self, X, y, parameter_iterable)
376 train, test, self.verbose, parameters,
377 self.fit_params, return_parameters=True)
--> 378 for parameters in parameter_iterable
379 for train, test in cv)
380
/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.pyc in __call__(self, iterable)
651 self._iterating = True
652 for function, args, kwargs in iterable:
--> 653 self.dispatch(function, args, kwargs)
654
655 if pre_dispatch == "all" or n_jobs == 1:
/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.pyc in dispatch(self, func, args, kwargs)
398 """
399 if self._pool is None:
--> 400 job = ImmediateApply(func, args, kwargs)
401 index = len(self._jobs)
402 if not _verbosity_filter(index, self.verbose):
/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.pyc in __init__(self, func, args, kwargs)
136 # Don't delay the application, to avoid keeping the input
137 # arguments in memory
--> 138 self.results = func(*args, **kwargs)
139
140 def get(self):
/usr/local/lib/python2.7/dist-packages/sklearn/cross_validation.pyc in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters)
1238 else:
1239 estimator.fit(X_train, y_train, **fit_params)
-> 1240 test_score = _score(estimator, X_test, y_test, scorer)
1241 if return_train_score:
1242 train_score = _score(estimator, X_train, y_train, scorer)
/usr/local/lib/python2.7/dist-packages/sklearn/cross_validation.pyc in _score(estimator, X_test, y_test, scorer)
1294 score = scorer(estimator, X_test)
1295 else:
-> 1296 score = scorer(estimator, X_test, y_test)
1297 if not isinstance(score, numbers.Number):
1298 raise ValueError("scoring must return a number, got %s (%s) instead."
/usr/local/lib/python2.7/dist-packages/sklearn/metrics/scorer.pyc in __call__(self, clf, X, y)
136 y_type = type_of_target(y)
137 if y_type not in ("binary", "multilabel-indicator"):
--> 138 raise ValueError("{0} format is not supported".format(y_type))
139
140 try:
ValueError: multiclass format is not supported
從:
“注意:此實現僅限於標簽指示符格式的二元分類任務或多標簽分類任務。”
嘗試:
from sklearn import preprocessing
y = preprocessing.label_binarize(y, classes=[0, 1, 2, 3])
在你訓練之前。 這將對您的 y 執行“one-hot”編碼。
刪除scoring='roc_auc'
它將起作用,因為roc_auc
曲線不支持分類數據。
正如已經指出的那樣,您必須首先對y
二值化
y = label_binarize(y, classes=[0, 1, 2, 3])
然后使用多類學習算法,如OneVsRestClassifier
或OneVsOneClassifier
。 例如:
clf_SVM = OneVsRestClassifier(LinearSVC())
params = {
'estimator__C': [0.5, 1.0, 1.5],
'estimator__tol': [1e-3, 1e-4, 1e-5],
}
gs = GridSearchCV(clf_SVM, params, cv=5, scoring='roc_auc')
gs.fit(corpus1, y)
根據您的問題,您可以直接使用to_categorical
而不是preprocessing.label_binarize()
。 問題實際上來自使用roc_auc
= roc_auc
。 請注意, roc_auc
不支持分類數據。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.