简体   繁体   English

机器学习 cross_val_score 与 cross_val_predict

[英]MachineLearning cross_val_score vs cross_val_predict

While building a generic evaluation tool, I came upon the following problem, where the cross_val_score.mean() gives slightly different results than cross_val_predict.在构建通用评估工具时,我遇到了以下问题,其中 cross_val_score.mean() 给出的结果与 cross_val_predict 略有不同。

For calculating the testing score I have the following code, which is calculating the score for each fold and then the mean of all.为了计算测试分数,我有以下代码,它计算每个折叠的分数,然后计算所有的平均值。

testing_score = cross_val_score(clas_model, algo_features, algo_featurest, cv=folds).mean()

For calculating the tp, fp, tn, fn I have the following code, which is calculating these metrics for all folds (i suppose the sum).为了计算 tp、fp、tn、fn,我有以下代码,它正在计算所有折叠的这些指标(我想是总和)。

test_clas_predictions = cross_val_predict(clas_model, algo_features, algo_featurest, cv=folds)
test_cm = confusion_matrix(algo_featurest, test_clas_predictions)
test_tp = test_cm[1][1]
test_fp = test_cm[0][1]
test_tn = test_cm[0][0]
test_fn = test_cm[1][0]

The outcome of this code is:这段代码的结果是:

                         algo      test  test_tp  test_fp  test_tn  test_fn
5                  GaussianNB  0.719762       25       13      190       71
4          LogisticRegression  0.716429       24       13      190       72
2      DecisionTreeClassifier  0.702381       38       33      170       58
0  GradientBoostingClassifier  0.682619       37       36      167       59
3        KNeighborsClassifier  0.679048       36       36      167       60
1      RandomForestClassifier  0.675952       40       43      160       56

So picking the first line cross_val_score.mean() gave 0.719762 (test) and by calculating the score 25+190/25+13+190+71=0.719063545150... ((tp+tn)/(tp+tn+fp+fn)) which are slighty different.所以选择第一行 cross_val_score.mean() 给出 0.719762 (test) 并通过计算得分 25+190/25+13+190+71=0.719063545150... ((tp+tn)/(tp+tn+fp+ fn)) 略有不同。

I had the chance to read this from an article in quora: "In cross_val_predict() elements are grouped slightly different than in cross_val_score(). It means that when you will calculate the same metric using these functions, you can get different results."我有机会从 quora 的一篇文章中读到:“在 cross_val_predict() 中,元素的分组方式与 cross_val_score() 中的略有不同。这意味着当您使用这些函数计算相同的指标时,您可以获得不同的结果。”

Is there any particular reason behind this?这背后有什么特别的原因吗?

This is also called out in the documentation for cross_val_predict : cross_val_predict的文档中也提到了这一点:

Passing these predictions into an evaluation metric may not be a valid way to measure generalization performance.将这些预测传递到评估指标中可能不是衡量泛化性能的有效方法。 Results can differ from cross_validate and cross_val_score unless all tests sets have equal size and the metric decomposes over samples.结果可能与cross_validatecross_val_score不同,除非所有测试集都具有相同的大小,并且度量标准在样本上分解。

It looks like in your case your metric is accuracy, which does decompose over samples.在您的情况下,您的指标似乎是准确性,它确实会分解样本。 But it is possible (actually likely, because the total size is a not-highly-divisible 299) that your test folds are not of the same size, which could explain the very small (relative) difference in the two.但是有可能(实际上很可能,因为总大小不是高度可分的 299)您的测试折叠大小不同,这可以解释两者之间非常小的(相对)差异。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 cross_val_score 和 cross_val_predict 的区别 - Difference between cross_val_score and cross_val_predict 使用 cross_val_predict 与 cross_val_score 时,scikit-learn 分数不同 - scikit-learn scores are different when using cross_val_predict vs cross_val_score 同时调用SKLearn的cross_val_score和cross_val_predict? - Call SKLearn's cross_val_score and cross_val_predict at the same time? 是否可以在cross_val_score 中的cross_val_predict 中使用相同的k 折? - Is it possible to use the same k-folds in cross_val_predict that are in cross_val_score? ROC AUC 值与 cross_val_score 和 cross_val_predict 的差异 - Differences in ROC AUC values with cross_val_score and cross_val_predict cross_val_score,cross_val_predict和cross_val_validate如何进行培训,测试和验证? - How does cross_val_score, cross_val_predict, and cross_val_validate take care of training, testing and validation? 用cross_val_score计算的指标与从cross_val_predict开始计算的相同指标有何不同? - How a metric computed with cross_val_score can differ from the same metric computed starting from cross_val_predict? 重复KFold & cross_val_predict - RepeatedKFold & cross_val_predict 射线 + cross_val_score - Ray + cross_val_score 如何在机器学习中使用 cross_val_score 进行预测 - how use cross_val_score to predict in machine learning
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM