简体   繁体   中英

Tuning hyper-parameters in SVM OVO and OVA for multiclass classification

Suppose I am working on a multiclass classification problem (with N classes) and I want to use SVM as classification method.

I can adopt two strategies: One-Vs-One (OVO) and One-Vs-All (OVA). In the first case, I need to train N(N-1)/2 classifiers, namely, class1 vs class2, ..., class1 vs classN, ..., class(N-1) vs classN , while in the second case just N , namely class1 vs rest, ..., class N vs rest .

From my knowledge, the typical (and general) code for the two scenarios, included the tuning of the hyper-parameters, would be something as:

OVO

from sklearn import svm
from sklearn.model_selection import GridSearchCV
X = # features-set
y = # labels
params_grid = # whatever
clf = GridSearchCV(svm.SVC(), params_grid)
clf.fit(X, y)

OVA

from sklearn import svm
from sklearn.multiclass import OneVsRestClassifier
from sklearn.model_selection import GridSearchCV
X = # features-set
y = # labels
params_grid = # whatever
clf = GridSearchCV(OneVsRestClassifier(svm.SVC()), params_grid)
clf.fit(X, y)

My doubt is the following: the code above reported searches the best hyper-parameters shared between all the N(N-1)/2 or N classifiers, based on the strategy. In other words, the grid-search finds the "optimal" parameters in average between all the classifiers.

So, my question is: why not searching the best hyper-parameters set, one for each of the N(N-1)/2 or N classifiers? I cannot find any reference on this topic, so I do not know if finding the best parameters separately for each classifier is conceptually wrong or if there is another explanation.

I can adopt two strategies: One-Vs-One (OVO) and One-Vs-All (OVA)

You may choose whatever hyperparameter tuning strategy you like -- Leave-One-Out , K-fold , Randomized K-fold -- given available computational resources and time. At the end of the day (week?) the ability of your ML model to generalize well is what matters. And when it comes to model ability to learn and generalize the time is better invested in feature engineering, not combing through all possible combinations of params. To say the truth, you'll never exhaust all possible combinations because they are given in real numbers.

why not searching the best hyper-parameters set, one for each of the N(N-1)/2or N classifiers

  • We do it for every candidates we have, that is defined by the cardinality of the hyperparameter search space

  • We repeat it for every set of validation subfolders we have, that is defined by your cross-validation strategy.

EDIT

Concerning your multiclass prediction strategy. Yes, OVO and OVA (OVR) do exist, though predicting multiclass softprobs is more conventional these days. With OVR you'll get on top another dimension, ie number of classes. And yes, conceptually you may tune hyperparameters for every OVR model separately. Your calculations will become (××).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM