简体   繁体   English

需要帮助理解 sklearn python 中的 cross_val_score

[英]Need help understanding cross_val_score in sklearn python

I am currently trying to implement K-FOLD cross validation in classification using sklearn in python.我目前正在尝试使用 python 中的 sklearn 在分类中实现 K-FOLD 交叉验证。 I understand the basic concept behind K-FOLD and cross validation.我了解 K-FOLD 和交叉验证背后的基本概念。 However, I dont understand what is the cross_val_score and what does it do and what role does the CV iteration have in getting the array of scores we get.但是,我不明白什么是 cross_val_score 以及它的作用以及 CV 迭代在获取我们得到的分数数组方面有什么作用。 Below are the examples from the official documentation page of sklearn.以下是来自 sklearn 官方文档页面的示例。

**Example 1**
from sklearn import datasets, linear_model
from sklearn.model_selection import cross_val_score
diabetes = datasets.load_diabetes()
X = diabetes.data[:150]
y = diabetes.target[:150]
lasso = linear_model.Lasso()
print(cross_val_score(lasso, X, y, cv=3))  
***OUPUT***
[0.33150734 0.08022311 0.03531764]

Taking a look at Example 1 , the output generates 3 values in an array.查看示例 1 ,输出在数组中生成 3 个值。 I know that when we use kfold, n_split is the command that generates number of folds.我知道当我们使用 kfold 时,n_split 是生成折叠数的命令。 So what does cv do in this example?那么 cv 在这个例子中做了什么?

**My Code**
kf = KFold(n_splits=4,random_state=seed,shuffle=False)
print('Get_n_splits',kf.get_n_splits(X),'\n\n')
for train_index, test_index in kf.split(X):
print('TRAIN:', train_index, 'TEST:', test_index)
x_train, x_test = df.iloc[train_index], df.iloc[test_index]
y_train, y_test = df.iloc[train_index], df.iloc[test_index]

print('\n\n')

# use train_test_split to split into training and testing data
x_train, x_test, y_train, y_test = cross_validation.train_test_split(X, y,test_size=0.25,random_state=0)

# fit / train the model using the training data
clf = BernoulliNB()
model = clf.fit(x_train, y_train)
y_predicted = clf.predict(x_test)

scores = cross_val_score(model, df, y, cv=4)
print('\n\n')
print('Bernoulli Naive Bayes Classification Cross-validated Scores:', scores)
print('\n\n')

Looking at My Code , I am using 4 Fold cross validation for Bernoulli Naive Bayes Classifier and am using cv=4 in score as below : scores = cross_val_score(model, df, y, cv=4) The above line gives me an array of 4 values.查看我的代码,我正在使用伯努利朴素贝叶斯分类器的 4 折交叉验证,并在分数中使用 cv=4 如下:scores = cross_val_score(model, df, y, cv=4) 上面的行给了我一个数组4 个值。 However, if I change it to cv= 8 as below : scores = cross_val_score(model, df, y, cv=8) then an array of 8 values is generated as output.但是,如果我将其更改为 cv= 8 如下:scores = cross_val_score(model, df, y, cv=8) 然后生成一个包含 8 个值的数组作为输出。 So again, what does cv do here.再说一次, cv 在这里做什么。

I did read the documentation over and over again and searched numerous websites but since I am a newbie, I really don't understand what cv does and how the scores are generated.我确实一遍又一遍地阅读文档并搜索了许多网站,但由于我是新手,我真的不明白 cv 做什么以及如何生成分数。

Any and all help would be really appreciated.任何和所有的帮助将不胜感激。

Thanks in advance提前致谢

In a K-FOLD Cross Validation, the following procedure is followed as follows:在 K-FOLD 交叉验证中,遵循以下程序:

  1. Model is trained using K-1 of the folds as training data使用折叠的 K-1 作为训练数据训练模型
  2. Resulting Model is validated on the remaining data结果模型在剩余数据上进行验证

This process is repeated K times and performance measure such as "ACCURACY" is computed at each step.此过程重复 K 次,并在每一步计算性能指标,例如“准确度”

Please look at the image below to get a clear picture.请看下面的图片以获得清晰的图片。 It is taken from Cross Validation module of Scikit-Learn.它取自 Scikit-Learn 的交叉验证模块。

Cross Validation交叉验证

>>> from sklearn.model_selection import cross_val_score
>>> clf = svm.SVC(kernel='linear', C=1)
>>> scores = cross_val_score(clf, iris.data, iris.target, cv=5)
>>> scores                                              
array([0.96..., 1.  ..., 0.96..., 0.96..., 1.        ])
>>> print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))
Accuracy: 0.98 (+/- 0.03)

Here the single mean Score is calculated.这里计算单个平均分数。 By default, the score computed at each CV iteration is the score method of the estimator.默认情况下,每次 CV 迭代计算的分数是估计器的分数方法。

I have taken help from the links mentioned below.我从下面提到的链接中获得了帮助。

  1. " https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html#sklearn.model_selection.cross_val_score " " https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html#sklearn.model_selection.cross_val_score "

  2. ' https://scikit-learn.org/stable/modules/cross_validation.html#cross-validation ' ' https://scikit-learn.org/stable/modules/cross_validation.html#cross-validation '

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM