简体   繁体   English

预测sklearn中的训练数据

[英]Predict training data in sklearn

I use scikit-learn's SVM like so: 我像这样使用scikit-learn的SVM:

clf = svm.SVC()
clf.fit(td_X, td_y) 

My question is when I use the classifier to predict the class of a member of the training set, could the classifier ever be wrong even in scikit-learns implementation. 我的问题是,当我使用分类器预测训练集成员的班级时,即使在scikit-learns实现中,分类器也可能是错误的。 (eg. clf.predict(td_X[a])==td_Y[a] ) (例如clf.predict(td_X[a])==td_Y[a]

Yes definitely, run this code for example: 是的,可以运行以下代码,例如:

from sklearn import svm
import numpy as np
clf = svm.SVC()
np.random.seed(seed=42)
x=np.random.normal(loc=0.0, scale=1.0, size=[100,2])
y=np.random.randint(2,size=100)
clf.fit(x,y)
print(clf.score(x,y))

The score is 0.61, so nearly 40% of the training data is missclassified. 分数是0.61,因此将近40%的训练数据被错误分类。 Part of the reason is that even though the default kernel is 'rbf' (which in theory should be able to classify perfectly any training data set, as long as you don't have two identical training points with different labels), there is also regularization to reduce overfitting. 部分原因是,即使默认内核为'rbf' (理论上也应该能够对任何训练数据集进行完美分类,只要您没有带有相同标签的两个相同训练点),正则化以减少过度拟合。 The default regularizer is C=1.0 . 默认的正则化器为C=1.0

If you run the same code as above but switch clf = svm.SVC() to clf = svm.SVC(C=200000) , you'll get an accuracy of 0.94. 如果运行与上述相同的代码,但将clf = svm.SVC()切换为clf = svm.SVC(C=200000) ,则精度为0.94。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM