简体   繁体   English

使用 cross_val_score 计算训练分数

[英]Computing training score using cross_val_score

I am using cross_val_score to compute the mean score for a regressor.我正在使用cross_val_score来计算回归量的平均分数。 Here's a small snippet.这是一个小片段。

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score 

cross_val_score(LinearRegression(), X, y_reg, cv = 5)

Using this I get an array of scores.使用它,我得到了一系列分数。 I would like to know how the scores on the validation set (as returned in the array above) differ from those on the training set, to understand whether my model is over-fitting or under-fitting.我想知道验证集上的分数(如上面数组中返回的)与训练集上的分数有何不同,以了解我的模型是过拟合还是欠拟合。

Is there a way of doing this with the cross_val_score object?有没有办法用cross_val_score对象做到这一点?

You can use cross_validate instead of cross_val_score您可以使用cross_validate而不是cross_val_score
according to doc :根据文档

The cross_validate function differs from cross_val_score in two ways - cross_validate函数在两个方面与cross_val_score不同 -

  • It allows specifying multiple metrics for evaluation.它允许指定多个评估指标。
  • It returns a dict containing training scores , fit-times and score-times in addition to the test score .除了测试分数之外,它还返回一个包含训练分数、拟合时间和分数时间的字典。

Why would you want that?你为什么要那样? cross_val_score(cv=5) does that for you as it splits your train data 10 times and verifies accuracy scores on 5 test subsets. cross_val_score(cv=5)会为您执行此操作,因为它将您的训练数据拆分 10 次并验证 5 个测试子集的准确度分数。 This method already serves as a way to prevent your model from over-fitting.这种方法已经可以作为一种防止模型过度拟合的方法。

Anyway, if you are eager to verify accuracy on your validation data, then you have to fit your LinearRegression first on X and y_reg .无论如何,如果您渴望验证验证数据的准确性,那么您必须首先在Xy_reg上拟合LinearRegression

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM