简体   繁体   English

完美的精度,召回率和f1得分,但预测不佳

[英]Perfect precision, recall and f1-score, yet bad prediction

Using scikit-learn to classify a binary problem. 使用scikit-learn对二进制问题进行分类。 Getting perfect classification_report (all 1's). 获得完美的classification_report (全1)。 Yet prediction gives 0.36 . 但预测得出0.36 How can it be? 怎么可能?

I'm familiar with imbalanced labels. 我熟悉标签不平衡的情况。 Yet I don't think this is the case here since f1 and the other score columns, as well as the confusion matrix, indicate perfect score. 但是我不认为这是事实,因为f1和其他分数列以及混淆矩阵表示完美分数。

# Set aside the last 19 rows for prediction.
X1, X_Pred, y1, y_Pred = train_test_split(X, y, test_size= 19, 
                shuffle = False, random_state=None)

X_train, X_test, y_train, y_test = train_test_split(X1, y1, 
         test_size= 0.4, stratify = y1, random_state=11)

clcv = DecisionTreeClassifier()
scorecv = cross_val_score(clcv, X1, y1, cv=StratifiedKFold(n_splits=4), 
                         scoring= 'f1') # to balance precision/recall
clcv.fit(X1, y1)
y_predict = clcv.predict(X1)
cm = confusion_matrix(y1, y_predict)
cm_df = pd.DataFrame(cm, index = ['0','1'], columns = ['0','1'] )
print(cm_df)
print(classification_report( y1, y_predict ))
print('Prediction score:', clcv.score(X_Pred, y_Pred)) # unseen data

Output: 输出:

confusion:
      0   1
0  3011   0
1     0  44

              precision    recall  f1-score   support
       False       1.00      1.00      1.00      3011
        True       1.00      1.00      1.00        44

   micro avg       1.00      1.00      1.00      3055
   macro avg       1.00      1.00      1.00      3055
weighted avg       1.00      1.00      1.00      3055

Prediction score: 0.36

The issue is that you are overfitting. 问题是您过度拟合。

There are lots of code that is not used, so let's prune: 有很多未使用的代码,所以让我们修剪一下:

# Set aside the last 19 rows for prediction.
X1, X_Pred, y1, y_Pred = train_test_split(X, y, test_size= 19, 
                shuffle = False, random_state=None)

clcv = DecisionTreeClassifier()
clcv.fit(X1, y1)
y_predict = clcv.predict(X1)
cm = confusion_matrix(y1, y_Pred)
cm_df = pd.DataFrame(cm, index = ['0','1'], columns = ['0','1'] )
print(cm_df)
print(classification_report( y1, y_Pred ))
print('Prediction score:', clcv.score(X_Pred, y_Pred)) # unseen data

So clearly, there is no cross validation here, and the obvious reason for a low prediction score is the overfitting of the decision tree classifier. 显然,这里没有交叉验证,而较低的预测分数的明显原因是决策树分类器的过度拟合。

Use the score from the cross validation, and you should see the issue there directly. 使用交叉验证中的分数,您应该在那里直接看到问题。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从Sklearn分类报告中返回精确度,召回率和F1分数的平均分数? - How to return average score for precision, recall and F1-score from Sklearn Classification report? 如何使用交叉验证在多类数据集中对精度、召回率和 f1-score 进行评分? - how to score precision, recall and f1-score in a multi-class dataset using cross-validate? 如何显示进动、召回和 F1 分数? - How to show Precession, Recall and F1-Score? 如何计算神经网络模型中的准确率、召回率和 F1 分数? - How can I calculate precision, recall and F1-score in Neural Network models? 如何在 python 中计算一类 SVM 的准确度、F1 分数、召回率、精度和 EER? - How to compute accuracy, F1-score, recall, precision and EER for one-class SVM in python? 微观宏观和加权平均值都具有相同的精度,召回率,f1分数 - micro macro and weighted average all have the same precision, recall, f1-score 如何使用 scikit learn 计算多类案例的准确率、召回率、准确率和 f1 分数? - How to compute precision, recall, accuracy and f1-score for the multiclass case with scikit learn? 打印包含每个查询的准确率、精确度、召回率和 F1 分数的字典 - Print dictionary containing accuracy, precision, recall and F1-score for each query 为什么F1-score,Recall,Precision都等于1? (图像分类linearSVM) - Why F1-score, Recall, Precision all equal to 1? (image classification linearSVM) 相同的测试和预测值给出 0 精度、召回率和 NER 的 f1 分数 - Same test and prediction values gives 0 precision, recall, f1 score for NER
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM