简体   繁体   English

使用python中的SVM进行机器学习的分类报告测试集错误

[英]Error in classification report test set for machine learning with SVM in python

I split the data into test and train sets both of which have the target values '0's and '1's. 我将数据分为测试集和训练集,它们的目标值均为“ 0”和“ 1”。 But after fitting and predicting with SVM the classification report states that there are Zero '0's in the test sample which is not true. 但是,在使用SVM拟合和预测后,分类报告指出测试样本中存在零“ 0”,这是不正确的。

from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()
df = pd.DataFrame(data = data['data'],columns=data['feature_names'])
x = df
y = data['target']
xtrain,xtest,ytrain,ytest 
= train_test_split(x,y,test_size=0.3,random_state=42)

As you can see below, the test has 0s and 1s but the support in the classification report states that there aren't any 0s! 如下所示,测试有0和1,但是分类报告中的支持表明没有0!

!( https://i.imgur.com/wjEjIvX.png ) !( https://i.imgur.com/wjEjIvX.png

(It is always a good idea to include your relevant code in the example, and not in images) (在示例中而不是在图像中包括相关代码始终是一个好主意)

the classification report states that there are Zero '0's in the test sample which is not true. 分类报告指出测试样本中存在零“ 0”,这是不正确的。

This is because, from your code in the linked image, it turns out that you have switched the arguments in the classification_report ; 这是因为,从链接的图像在你的代码,事实证明,您已切换的参数classification_report ; you have used: 您曾经使用过:

print(classification_report(pred, ytest)) # wrong order of arguments

which indeed gives: 这确实给出了:

             precision    recall  f1-score   support

    class 0       0.00      0.00      0.00         0
    class 1       1.00      0.63      0.77       171

avg / total       1.00      0.63      0.77       171

but the correct usage (see the docs ) is 但是正确的用法(请参阅docs )是

print(classification_report(ytest, pred)) # ytest first

which gives 这使

             precision    recall  f1-score   support

    class 0       0.00      0.00      0.00        63
    class 1       0.63      1.00      0.77       108

avg / total       0.40      0.63      0.49       171

along with the following warning message: 以及以下警告消息:

C:\\Users\\Root\\Anaconda3\\envs\\tensorflow1\\lib\\site-packages\\sklearn\\metrics\\classification.py:1135: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. C:\\ Users \\ Root \\ Anaconda3 \\ envs \\ tensorflow1 \\ lib \\ site-packages \\ sklearn \\ metrics \\分类样本。 'precision', 'predicted', average, warn_for) 'precision','predicted',平均值,warn_for)

because, as already pointed out in the comments, you predict only 1's: 因为正如评论中已经指出的那样,您只能预测1:

pred
# result:
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

the reason of which is another story, and not part of the current question. 原因是另一个故事,而不是当前问题的一部分。

Here is the complete reproducible code for the above: 这是上述代码的完整可复制代码:

from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer

X, y = load_breast_cancer(return_X_y=True)
xtrain,xtest,ytrain,ytest = train_test_split(X,y,test_size=0.3,random_state=42)

from sklearn.svm import SVC
svc=SVC()
svc.fit(xtrain, ytrain)
pred = svc.predict(xtest)

print(classification_report(ytest, pred))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM