简体   繁体   English

使用Scikit-learn训练SVM(支持向量机)分类器

[英]Train a SVM (Support Vector Machine) classifier with Scikit-learn

I want to train different classifier with using Scikit-learn with following code for Multi-label classification problem: 我想使用Scikit-learn和以下代码针对多标签分类问题来训练不同的分类

names = [
    "Nearest Neighbors",
    "Linear SVM", "RBF SVM", "Gaussian Process",
    "Decision Tree", "Random Forest", "Neural Net", "AdaBoost",
    "Naive Bayes", "QDA"]

classifiers = [
    KNeighborsClassifier(3),
    SVC(C=0.025),
    SVC(gamma=2, C=1),
    GaussianProcessClassifier(1.0 * RBF(1.0)),
    DecisionTreeClassifier(max_depth=5),
    RandomForestClassifier(max_depth=5),
    MLPClassifier(alpha=0.5),
    AdaBoostClassifier(),
    GaussianNB(),
    QuadraticDiscriminantAnalysis()]

for name, clf in izip(names, classifiers):
    clf.fit(X_train, Y_train)
    score = clf.score(X_train, Y_test)
    print name, score

KNeighbors classifier works properly but when I reach to the SVM classifier it throws following exception: KNeighbors分类器可以正常工作,但是当我到达SVM分类器时,它将引发以下异常:

Traceback (most recent call last):
  File "/Users/mac/PycharmProjects/GraphLstm/classifier.py", line 87, in <module>
    clf.fit(X_train, Y_train)
  File "/Library/Python/2.7/site-packages/sklearn/svm/base.py", line 151, in fit
    X, y = check_X_y(X, y, dtype=np.float64, order='C', accept_sparse='csr')
  File "/Library/Python/2.7/site-packages/sklearn/utils/validation.py", line 526, in check_X_y
    y = column_or_1d(y, warn=True)
  File "/Library/Python/2.7/site-packages/sklearn/utils/validation.py", line 562, in column_or_1d
    raise ValueError("bad input shape {0}".format(shape))
ValueError: bad input shape (9280, 39)

What's the reason and How can I fix that? 是什么原因,我该如何解决?

Edit: As commented by @Vivek following classifier only allowed for Multi-label classification : 编辑:正如@Vivek所评论的,以下分类器仅适用于多标签分类

sklearn.tree.DecisionTreeClassifier
sklearn.tree.ExtraTreeClassifier
sklearn.ensemble.ExtraTreesClassifier
sklearn.neighbors.KNeighborsClassifier
sklearn.neural_network.MLPClassifier
sklearn.neighbors.RadiusNeighborsClassifier
sklearn.ensemble.RandomForestClassifier
sklearn.linear_model.RidgeClassifierCV

The fit function of the knn classifier allows a matrix as y-value. knn分类器的拟合函数允许将矩阵作为y值。 For the svm this is not allowed. 对于svm,这是不允许的。 The error message tries to hint you on a disallowed y-shape 错误消息试图提示您使用不允许的Y形

Since this is a multi-label classification problem, not all estimators in scikit will be able to handle them inherently. 由于这是一个多标签分类问题,因此并不是scikit中的所有估计器都能够固有地处理它们。 The documentation provides a list of estimators which support multi-label out of the box like various tree based estimators or others : 文档提供了一个估算器列表,这些估算器支持开箱即用的多标签,例如各种基于树的估算器或其他:

sklearn.tree.DecisionTreeClassifier
sklearn.tree.ExtraTreeClassifier
sklearn.ensemble.ExtraTreesClassifier
sklearn.neighbors.KNeighborsClassifier
...
...

However there are strategies like one-vs-all which can be employed to train the required estimator (which doesn't support multilabel directly). 但是,可以采用诸如“ 一对多”的策略来训练所需的估计量(不直接支持多标签)。 Sklearn estimator OneVsRestClassifier is made for this. 为此创建了Sklearn估计器OneVsRestClassifier

See the documentation here for more details about it. 有关更多详细信息,请参见此处文档

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM