cross_val_score 和 cross_val_predict 的区别

Question

I want to evaluate a regression model build with scikitlearn using cross-validation and getting confused, which of the two functions cross_val_score and cross_val_predict I should use.我想使用交叉验证来评估使用 scikitlearn 构建的回归模型并感到困惑，我应该使用cross_val_score和cross_val_predict这两个函数中的cross_val_score 。 One option would be :一种选择是：

cvs = DecisionTreeRegressor(max_depth = depth)
scores = cross_val_score(cvs, predictors, target, cv=cvfolds, scoring='r2')
print("R2-Score: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))

An other one, to use the cv-predictions with the standard r2_score :另一个，将 cv 预测与标准r2_score ：

cvp = DecisionTreeRegressor(max_depth = depth)
predictions = cross_val_predict(cvp, predictors, target, cv=cvfolds)
print ("CV R^2-Score: {}".format(r2_score(df[target], predictions_cv)))

I would assume that both methods are valid and give similar results.我会假设这两种方法都是有效的并给出相似的结果。 But that is only the case with small k-folds.但这只是小 k 折的情况。 While the r^2 is roughly the same for 10-fold-cv, it gets increasingly lower for higher k-values in the case of the first version using "cross_vall_score".虽然 10 倍 cv 的 r^2 大致相同，但在使用“cross_vall_score”的第一个版本的情况下，对于更高的 k 值，它变得越来越低。 The second version is mostly unaffected by changing numbers of folds.第二个版本几乎不受折叠数变化的影响。

Is this behavior to be expected and do I lack some understanding regarding CV in SKLearn?这种行为是否在意料之中，我是否对 SKLearn 中的简历缺乏一些了解？

Answer 1

cross_val_score returns score of test fold where cross_val_predict returns predicted y values for the test fold. cross_val_score返回测试折叠的分数，其中cross_val_predict返回测试折叠的预测 y 值。

For the cross_val_score() , you are using the average of the output, which will be affected by the number of folds because then it may have some folds which may have high error (not fit correctly).对于cross_val_score() ，您使用的是输出的平均值，这将受到折叠数的影响，因为它可能有一些可能具有高错误（不正确）的折叠。

Whereas, cross_val_predict() returns, for each element in the input, the prediction that was obtained for that element when it was in the test set.而cross_val_predict()为输入中的每个元素返回当该元素在测试集中时为该元素获得的预测。 [Note that only cross-validation strategies that assign all elements to a test set exactly once can be used]. [请注意，只有将所有元素分配给测试集一次的交叉验证策略才能使用]。 So the increasing the number of folds, only increases the training data for the test element, and hence its result may not be affected much.所以增加折叠次数，只会增加测试元素的训练数据，因此可能不会对其结果产生太大影响。

Edit (after comment)编辑（评论后）

Please have a look the following answer on how cross_val_predict works:请查看以下有关cross_val_predict如何工作的答案：

How is scikit-learn cross_val_predict accuracy score calculated? scikit-learn cross_val_predict 准确率分数是如何计算的？

I think that cross_val_predict will be overfit because as the folds increase, more data will be for train and less will for test.我认为cross_val_predict会过拟合，因为随着折叠次数的增加，更多的数据将用于训练而更少的数据用于测试。 So the resultant label is more dependent on training data.所以得到的标签更依赖于训练数据。 Also as already told above, the prediction for one sample is done only once, so it may be susceptible to the splitting of data more.同样如上所述，一个样本的预测只进行一次，因此它可能更容易受到数据分裂的影响。 Thats why most of the places or tutorials recommend using the cross_val_score for analysis.这就是为什么大多数地方或教程都推荐使用cross_val_score进行分析的原因。

Answer 2

So this question also bugged me and while the other's made good points, they didn't answer all aspects of OP's question.所以这个问题也困扰着我，虽然其他人提出了很好的观点，但他们并没有回答 OP 问题的所有方面。

The true answer is: The divergence in scores for increasing k is due to the chosen metric R2 (coefficient of determination).正确答案是：增加 k 的分数差异是由于选择的度量 R2（决定系数）。 For eg MSE, MSLE or MAE there won't be any difference in using cross_val_score or cross_val_predict .对于例如 MSE、MSLE 或 MAE，使用cross_val_score或cross_val_predict不会有任何区别。

See the definition of R2 :参见R2的定义：

R^2 = 1 - (MSE(ground truth, prediction)/ MSE(ground truth, mean(ground truth) )) R^2 = 1 - (MSE(ground truth, prediction)/ MSE(ground truth, mean(ground truth) ))

The bold part explains why the score starts to differ for increasing k: the more splits we have, the fewer samples in the test fold and the higher the variance in the mean of the test fold.粗体部分解释了为什么随着 k 的增加分数开始不同：我们拥有的分割越多，测试折叠中的样本越少，测试折叠均值的方差越大。 Conversely, for small k, the mean of the test fold won't differ much of the full ground truth mean, as sample size is still large enough to have small variance.相反，对于较小的 k，测试折叠的均值与完整的真实均值相差不大，因为样本量仍然足够大以具有较小的方差。

Proof:证明：

import numpy as np
from sklearn.metrics import mean_absolute_error as mae
from sklearn.metrics import mean_squared_log_error as msle, r2_score

predictions = np.random.rand(1000)*100
groundtruth = np.random.rand(1000)*20

def scores_for_increasing_k(score_func):
    skewed_score = score_func(groundtruth, predictions)
    print(f'skewed score (from cross_val_predict): {skewed_score}')
    for k in (2,4,5,10,20,50,100,200,250):
        fold_preds = np.split(predictions, k)
        fold_gtruth = np.split(groundtruth, k)
        correct_score = np.mean([score_func(g, p) for g,p in zip(fold_gtruth, fold_preds)])

        print(f'correct CV for k={k}: {correct_score}')

for name, score in [('MAE', mae), ('MSLE', msle), ('R2', r2_score)]:
    print(name)
    scores_for_increasing_k(score)
    print()

Output will be:输出将是：

MAE
skewed score (from cross_val_predict): 42.25333901481263
correct CV for k=2: 42.25333901481264
correct CV for k=4: 42.25333901481264
correct CV for k=5: 42.25333901481264
correct CV for k=10: 42.25333901481264
correct CV for k=20: 42.25333901481264
correct CV for k=50: 42.25333901481264
correct CV for k=100: 42.25333901481264
correct CV for k=200: 42.25333901481264
correct CV for k=250: 42.25333901481264

MSLE
skewed score (from cross_val_predict): 3.5252449697327175
correct CV for k=2: 3.525244969732718
correct CV for k=4: 3.525244969732718
correct CV for k=5: 3.525244969732718
correct CV for k=10: 3.525244969732718
correct CV for k=20: 3.525244969732718
correct CV for k=50: 3.5252449697327175
correct CV for k=100: 3.5252449697327175
correct CV for k=200: 3.5252449697327175
correct CV for k=250: 3.5252449697327175

R2
skewed score (from cross_val_predict): -74.5910282783694
correct CV for k=2: -74.63582817089443
correct CV for k=4: -74.73848598638291
correct CV for k=5: -75.06145142821893
correct CV for k=10: -75.38967601572112
correct CV for k=20: -77.20560102267272
correct CV for k=50: -81.28604960074824
correct CV for k=100: -95.1061197684949
correct CV for k=200: -144.90258384605787
correct CV for k=250: -210.13375041871123

Of course, there is another effect not shown here, which was mentioned by others.当然，还有一个效果这里没有展示，是别人提到的。 With increasing k, there are more models trained on more samples and validated on fewer samples, which will effect the final scores, but this is not induced by the choice between cross_val_score and cross_val_predict .随着 k 的增加，有更多的模型在更多样本上训练并在更少样本上验证，这会影响最终分数，但这不是由cross_val_score和cross_val_predict之间的选择引起的。

Answer 3

I think the difference can be made clear by inspecting their outputs.我认为可以通过检查它们的输出来清楚区别。 Consider this snippet:考虑这个片段：

# Last column is the label
print(X.shape)  # (7040, 133)

clf = MLPClassifier()

scores = cross_val_score(clf, X[:,:-1], X[:,-1], cv=5)
print(scores.shape)  # (5,)

y_pred = cross_val_predict(clf, X[:,:-1], X[:,-1], cv=5)
print(y_pred.shape)  # (7040,)

Notice the shapes: why are these so?注意形状：为什么会这样？ scores.shape has length 5 because it is a score computed with cross-validation over 5 folds (see argument cv=5 ). scores.shape长度为 5，因为它是通过 5 次交叉验证计算得出的分数（请参阅参数cv=5 ）。 Therefore, a single real value is computed for each fold.因此，每个折叠都会计算一个实际值。 That value is the score of the classifier:该值是分类器的分数：

given true labels and predicted labels, how many answers the predictor were right in a particular fold?给定真实标签和预测标签，预测变量在特定折叠中有多少正确答案？

In this case, the y labels given in input are used twice: to learn from data and to evaluate the performances of the classifier.在这种情况下，输入中给出的 y 标签被使用两次：从数据中学习和评估分类器的性能。

On the other hand, y_pred.shape has length 7040, which is the shape of the dataset.另一方面， y_pred.shape长度为 7040，这是数据集的形状。 That is the length of the input dataset.那是输入数据集的长度。 This means that each value is not a score computed on multiple values, but a single value: the prediction of the classifier:这意味着每个值不是对多个值计算的分数，而是单个值：分类器的预测：

given the input data and their labels, what is the prediction of the classifier on a specific example that was in a test set of a particular fold?给定输入数据及其标签，分类器对特定折叠测试集中的特定示例的预测是什么？

Note that you do not know what fold was used: each output was computed on the test data of a certain fold, but you can't tell which (from this output, at least).请注意，您不知道使用了什么折叠：每个输出都是根据某个折叠的测试数据计算的，但您无法分辨出哪个（至少从这个输出中）。

In this case, the labels are used just once: to train the classifier.在这种情况下，标签只使用一次：用于训练分类器。 It's your job to compare these outputs to the true outputs to compute the score.您的工作是将这些输出与真实输出进行比较以计算分数。 If you just average them, as you did, the output is not a score, it's just the average prediction.如果你只是对它们求平均值，就像你所做的那样，输出不是一个分数，它只是平均预测。

cross_val_score 和 cross_val_predict 的区别

问题描述

3 个解决方案

解决方案1
34 已采纳 2017-04-25 14:45:24

解决方案2
8 2019-07-26 11:41:18

解决方案3
4 2018-10-24 08:19:39

cross_val_score 和 cross_val_predict 的区别

问题描述

3 个解决方案

解决方案1 34 已采纳 2017-04-25 14:45:24

解决方案2 8 2019-07-26 11:41:18

解决方案3 4 2018-10-24 08:19:39

解决方案1
34 已采纳 2017-04-25 14:45:24

解决方案2
8 2019-07-26 11:41:18

解决方案3
4 2018-10-24 08:19:39