简体繁体 English

cv.glmnet 中的训练/测试集比例来自 glmnet package in R

[英]Train/Test Set Proportion in cv.glmnet from glmnet package in R

原文 2021-12-22 02:28:24 3 1 r/ glmnet

I was just wondering what is the percentage of train and test set in cv.glmnet from glmnet package in R.我只是想知道在 R 中来自 glmnet package 的 cv.glmnet 中的训练和测试集的百分比是多少。 I have already read the glmnet package documentation and no information was included regarding the train/test set proportion.我已经阅读了 glmnet package 文档，并且没有包含有关训练/测试集比例的信息。 Please tell me if I missed something from the package documentation.如果我错过了 package 文档中的某些内容，请告诉我。 Any help would be greatly appreciated.任何帮助将不胜感激。 Thank you.谢谢你。

1 个解决方案

from the help page for ?cv.glmnet there are two parts to look at:在?cv.glmnet的帮助页面中，有两个部分可供查看：

Argument nfolds参数nfolds

number of folds - default is 10. Although nfolds can be as large as the sample size (leave-one-out CV), it is not recommended for large datasets.折叠数 - 默认为 10。虽然 nfolds 可以与样本大小一样大（留一法 CV），但不建议将其用于大型数据集。 Smallest value allowable is nfolds=3允许的最小值是 nfolds=3

And from the Values section for foldid并从foldid的值部分

if keep=TRUE, the fold assignments used如果 keep=TRUE，则使用的折叠分配

ie. IE。 set keep=TRUE in the function argument to access the folds afterwards在 function 参数中设置keep=TRUE以在之后访问折叠

The function will put each row in to 10 roughly equally sized groups/folds. function 会将每一行放入 10 个大致相同大小的组/折叠中。 Then it will run 10 iterations of the model, leaving one of these out each time for testing.然后它将运行 model 的 10 次迭代，每次都留出其中一个进行测试。 So its 90% train and 10% test but repeated 10 times.所以它的 90% 训练和 10% 测试但重复了 10 次。

You can supply your own folds with the foldid argument if you prefer.如果您愿意，可以使用foldid参数提供自己的折叠。 Hope that helps:)希望有帮助:)