简体   繁体   English

调整 SVM OVO 和 OVA 中的超参数以进行多类分类

[英]Tuning hyper-parameters in SVM OVO and OVA for multiclass classification

Suppose I am working on a multiclass classification problem (with N classes) and I want to use SVM as classification method.假设我正在研究一个多类分类问题(有N个类)并且我想使用 SVM 作为分类方法。

I can adopt two strategies: One-Vs-One (OVO) and One-Vs-All (OVA).我可以采用两种策略:一对一(OVO)和一对一(OVA)。 In the first case, I need to train N(N-1)/2 classifiers, namely, class1 vs class2, ..., class1 vs classN, ..., class(N-1) vs classN , while in the second case just N , namely class1 vs rest, ..., class N vs rest .在第一种情况下,我需要训练N(N-1)/2分类器,即class1 vs class2, ..., class1 vs classN, ..., class(N-1) vs classN ,而在第二种情况下案例只是N ,即class1 vs rest, ..., class N vs rest

From my knowledge, the typical (and general) code for the two scenarios, included the tuning of the hyper-parameters, would be something as:据我所知,这两种场景的典型(和通用)代码,包括超参数的调整,将是:

OVO卵子

from sklearn import svm
from sklearn.model_selection import GridSearchCV
X = # features-set
y = # labels
params_grid = # whatever
clf = GridSearchCV(svm.SVC(), params_grid)
clf.fit(X, y)

OVA OVA

from sklearn import svm
from sklearn.multiclass import OneVsRestClassifier
from sklearn.model_selection import GridSearchCV
X = # features-set
y = # labels
params_grid = # whatever
clf = GridSearchCV(OneVsRestClassifier(svm.SVC()), params_grid)
clf.fit(X, y)

My doubt is the following: the code above reported searches the best hyper-parameters shared between all the N(N-1)/2 or N classifiers, based on the strategy.我的疑问如下:上面报告的代码根据策略搜索所有N(N-1)/2N个分类器之间共享的最佳超参数。 In other words, the grid-search finds the "optimal" parameters in average between all the classifiers.换句话说,网格搜索在所有分类器之间平均找到“最佳”参数。

So, my question is: why not searching the best hyper-parameters set, one for each of the N(N-1)/2 or N classifiers?所以,我的问题是:为什么不搜索最好的超参数集,每个N(N-1)/2N个分类器一个? I cannot find any reference on this topic, so I do not know if finding the best parameters separately for each classifier is conceptually wrong or if there is another explanation.我找不到关于这个主题的任何参考资料,所以我不知道分别为每个分类器找到最佳参数是否在概念上是错误的,或者是否有其他解释。

I can adopt two strategies: One-Vs-One (OVO) and One-Vs-All (OVA)我可以采用两种策略:One-Vs-One (OVO) 和 One-Vs-All (OVA)

You may choose whatever hyperparameter tuning strategy you like -- Leave-One-Out , K-fold , Randomized K-fold -- given available computational resources and time.给定可用的计算资源和时间,您可以选择您喜欢的任何超参数调整策略Leave-One-OutK-foldRandomized K-fold At the end of the day (week?) the ability of your ML model to generalize well is what matters.归根结底(一周?)您的 ML model 的概括能力是最重要的。 And when it comes to model ability to learn and generalize the time is better invested in feature engineering, not combing through all possible combinations of params.而当谈到 model 学习和概括的能力时,最好将时间投资于特征工程,而不是梳理所有可能的参数组合。 To say the truth, you'll never exhaust all possible combinations because they are given in real numbers.说实话,你永远不会穷尽所有可能的组合,因为它们是以实数给出的。

why not searching the best hyper-parameters set, one for each of the N(N-1)/2or N classifiers为什么不搜索最好的超参数集,每个 N(N-1)/2 或 N 个分类器一个

  • We do it for every candidates we have, that is defined by the cardinality of the hyperparameter search space我们为我们拥有的每个候选人都这样做,这是由超参数搜索空间的基数定义的

  • We repeat it for every set of validation subfolders we have, that is defined by your cross-validation strategy.我们对我们拥有的每组验证子文件夹重复它,这是由您的交叉验证策略定义的。

EDIT编辑

Concerning your multiclass prediction strategy.关于您的多类预测策略。 Yes, OVO and OVA (OVR) do exist, though predicting multiclass softprobs is more conventional these days.是的,OVO 和 OVA (OVR) 确实存在,尽管如今预测多类 softprobs 更为传统。 With OVR you'll get on top another dimension, ie number of classes.使用 OVR,您将获得另一个维度,即课程数量。 And yes, conceptually you may tune hyperparameters for every OVR model separately.是的,从概念上讲,您可以分别为每个 OVR model 调整超参数。 Your calculations will become (××).您的计算将变为(××)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM