简体   繁体   English

使用sklearn cross_val_score和kfolds来拟合并帮助预测模型

[英]Using sklearn cross_val_score and kfolds to fit and help predict model

I'm trying to understand using kfolds cross validation from the sklearn python module. 我试图理解使用sklearn python模块中的kfolds交叉验证。

I understand the basic flow: 我理解基本流程:

  • instantiate a model eg model = LogisticRegression() 实例化模型,例如model = LogisticRegression()
  • fitting the model eg model.fit(xtrain, ytrain) 拟合模型,例如model.fit(xtrain, ytrain)
  • predicting eg model.predict(ytest) 预测例如model.predict(ytest)
  • use eg cross val score to test the fitted model accuracy. 使用例如交叉val分数来测试拟合的模型精度。

Where i'm confused is using sklearn kfolds with cross val score. 我很困惑的地方是使用具有交叉val分数的sklearn kfolds。 As I understand it the cross_val_score function will fit the model and predict on the kfolds giving you an accuracy score for each fold. 据我了解,cross_val_score函数将适合模型并在kfolds上进行预测,为每个折叠提供准确度分数。

eg using code like this: 例如使用这样的代码:

kf = KFold(n=data.shape[0], n_folds=5, shuffle=True, random_state=8)
lr = linear_model.LogisticRegression()
accuracies = cross_val_score(lr, X_train,y_train, scoring='accuracy', cv = kf)

So if I have a dataset with training and testing data, and I use the cross_val_score function with kfolds to determine the accuracy of the algorithm on my training data for each fold, is the model now fitted and ready for prediction on the testing data? 因此,如果我有一个包含训练和测试数据的数据集,并且我使用带有kfolds的cross_val_score函数来确定算法对每个折叠的训练数据的准确性,那么现在该model适合并准备好对测试数据进行预测? So in the case above using lr.predict 所以在上面的情况下使用lr.predict

Thanks for any help. 谢谢你的帮助。

No the model is not fitted. 没有安装模型。 Looking at the source code for cross_val_score : 查看cross_val_score源代码

 scores=parallel(delayed(_fit_and_score)(clone(estimator),X,y,scorer, train,test,verbose,None,fit_params) 

As you can see, cross_val_score clones the estimator before fitting the fold training data to it. 如您所见, cross_val_score在将折叠训练数据拟合到估计器之前克隆估计器。 cross_val_score will give you output an array of scores which you can analyse to know how the estimator performs for different folds of the data to check if it overfits the data or not. cross_val_score将为您输出一个分数数组,您可以分析这些分数以了解估算器如何针对数据的不同折叠执行以检查它是否过度拟合数据。 You can know more about it here 你可以在这里了解更多

You need to fit the whole training data to the estimator once you are satisfied with the results of cross_val_score , before you can use it to predict on test data. 一旦您对cross_val_score的结果感到满意,您需要将整个训练数据拟合到估算器,然后才能使用它来预测测试数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM