繁体   English   中英

Scikit学习:超参数调整后,对整个数据集使用交叉验证

[英]Scikit-learn: use cross-validation on whole dataset after hyperparameters tuning

我正在scikit-learn中使用决策树对垃圾邮件进行分类。 在这里和其他地方阅读了各种文章之后,我将我的初始数据集分为训练和测试,并使用交叉验证对训练集执行了超参数调整。 以我的理解,应该在训练和测试中计算分数,以检查模型是否过拟合; 考虑到测试集上的分数很好,我是否可以排除这个问题,并提出从整个数据集中获得的分数? 还是应该显示测试集的结果? 这是用于训练/测试集的代码:

scores = cross_val_score(tree, x_train, y_train, cv=10)
scores_pre = cross_val_score(tree, x_train, y_train, cv=10, scoring="precision")
scores_f1 = cross_val_score(tree, x_train, y_train, cv=10, scoring="f1")
scores_recall = cross_val_score(tree, x_train, y_train, cv=10, scoring="recall")
print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))
print("Precision: %0.2f (+/- %0.2f)" % (scores_pre.mean(), scores_pre.std() * 2))
print("F-Measure: %0.2f (+/- %0.2f)" % (scores_f1.mean(), scores_f1.std() * 2))
print("Recall: %0.2f (+/- %0.2f)" % (scores_recall.mean(), scores_recall.std() * 2))

Accuracy: 0.97 (+/- 0.02)
Precision: 0.98 (+/- 0.02)
F-Measure: 0.98 (+/- 0.01)
Recall: 0.98 (+/- 0.02)

scores = cross_val_score(tree, x_test, y_test, cv=10)
scores_pre = cross_val_score(tree, x_test, y_test, cv=10, scoring="precision")
scores_f1 = cross_val_score(tree, x_test, y_test, cv=10, scoring="f1")
scores_recall = cross_val_score(tree, x_test, y_test, cv=10, scoring="recall")
print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))
print("Precision: %0.2f (+/- %0.2f)" % (scores_pre.mean(), scores_pre.std() * 2))
print("F-Measure: %0.2f (+/- %0.2f)" % (scores_f1.mean(), scores_f1.std() * 2))
print("Recall: %0.2f (+/- %0.2f)" % (scores_recall.mean(), scores_recall.std() * 2))

Accuracy: 0.95 (+/- 0.03)
Precision: 0.96 (+/- 0.02)
F-Measure: 0.96 (+/- 0.02)
Recall: 0.97 (+/- 0.03)

这是整个数据集的代码:

scores = cross_val_score(tree, X, y, cv=10)
scores_pre = cross_val_score(tree, X, y, cv=10, scoring="precision")
scores_f1 = cross_val_score(tree, X, y, cv=10, scoring="f1")
scores_recall = cross_val_score(tree, X, y, cv=10, scoring="recall")
print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))
print("Precision: %0.2f (+/- %0.2f)" % (scores_pre.mean(), scores_pre.std() * 2))
print("F-Measure: %0.2f (+/- %0.2f)" % (scores_f1.mean(), scores_f1.std() * 2))
print("Recall: %0.2f (+/- %0.2f)" % (scores_recall.mean(), scores_recall.std() * 2))

Accuracy: 0.97 (+/- 0.04)
Precision: 0.98 (+/- 0.03)
F-Measure: 0.98 (+/- 0.03)
Recall: 0.98 (+/- 0.03)

不,您的最终报告分数应始终排在测试集上,而实际上是验证集。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM