[英]Using sklearn cross_val_score and kfolds to fit and help predict model
I'm trying to understand using kfolds cross validation from the sklearn python module. 我试图理解使用sklearn python模块中的kfolds交叉验证。
I understand the basic flow: 我理解基本流程:
model = LogisticRegression()
model = LogisticRegression()
model.fit(xtrain, ytrain)
model.fit(xtrain, ytrain)
model.predict(ytest)
model.predict(ytest)
Where i'm confused is using sklearn kfolds with cross val score. 我很困惑的地方是使用具有交叉val分数的sklearn kfolds。 As I understand it the cross_val_score function will fit the model and predict on the kfolds giving you an accuracy score for each fold.
据我了解,cross_val_score函数将适合模型并在kfolds上进行预测,为每个折叠提供准确度分数。
eg using code like this: 例如使用这样的代码:
kf = KFold(n=data.shape[0], n_folds=5, shuffle=True, random_state=8)
lr = linear_model.LogisticRegression()
accuracies = cross_val_score(lr, X_train,y_train, scoring='accuracy', cv = kf)
So if I have a dataset with training and testing data, and I use the cross_val_score
function with kfolds to determine the accuracy of the algorithm on my training data for each fold, is the model
now fitted and ready for prediction on the testing data? 因此,如果我有一个包含训练和测试数据的数据集,并且我使用带有kfolds的
cross_val_score
函数来确定算法对每个折叠的训练数据的准确性,那么现在该model
适合并准备好对测试数据进行预测? So in the case above using lr.predict
所以在上面的情况下使用
lr.predict
Thanks for any help. 谢谢你的帮助。
No the model is not fitted. 没有安装模型。 Looking at the source code for
cross_val_score
: 查看
cross_val_score
的源代码 :
scores=parallel(delayed(_fit_and_score)(clone(estimator),X,y,scorer, train,test,verbose,None,fit_params)
As you can see, cross_val_score
clones the estimator before fitting the fold training data to it. 如您所见,
cross_val_score
在将折叠训练数据拟合到估计器之前克隆估计器。 cross_val_score
will give you output an array of scores which you can analyse to know how the estimator performs for different folds of the data to check if it overfits the data or not. cross_val_score
将为您输出一个分数数组,您可以分析这些分数以了解估算器如何针对数据的不同折叠执行以检查它是否过度拟合数据。 You can know more about it here 你可以在这里了解更多
You need to fit the whole training data to the estimator once you are satisfied with the results of cross_val_score
, before you can use it to predict on test data. 一旦您对
cross_val_score
的结果感到满意,您需要将整个训练数据拟合到估算器,然后才能使用它来预测测试数据。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.