简体   繁体   English

randomForest:如何控制袋内/袋外样品的比例?

[英]randomForest: how to control the proportion of in-bag / out-of-bag samples?

I'm using R randomForest for various regression tasks. 我正在将R randomForest用于各种回归任务。 The hyperparameter tuning is still mysterious to me. 对我来说,超参数调整仍然很神秘。 I've got a handle on tuning ntree and mtry , but it makes intuitive sense that I'd want to also tune the number of samples in each bag as an additional balance of model bias & variance. 我有调优ntreemtry ,但是从直觉上来说,我还想调整每个包中的样本数量,以平衡模型偏差和方差。

Based on the documentation, I thought that this is what sampsize does. 根据文档,我认为这就是sampsize所做的。 But reading the function arguments reveals that it's more complicated than that. 但是阅读函数参数会发现它比这更复杂。 If I run with replacement ( replace = TRUE ), it seems I have no control over the proportion of in-bag / out-of-bag samples. 如果我运行replacement( replace = TRUE ),似乎我无法控制袋内/袋外样品的比例。 In fact, with replace = TRUE , I don't think the proportion that the algorithm uses is even documented. 实际上,使用replace = TRUE ,我什至认为该算法使用的比例甚至没有记载。

Documentation : sampsize: Size(s) of sample to draw. 说明文件sampsize: Size(s) of sample to draw.

Function arguments : sampsize = if (replace) nrow(x) else ceiling(.632*nrow(x)) 函数参数sampsize = if (replace) nrow(x) else ceiling(.632*nrow(x))

Is there a way to control the proportion of in-bag-samples? 有没有办法控制袋中样品的比例? Is this even a worthwhile tuning parameter? 这甚至是一个值得调整的参数吗?

The parameter sampsize = if (replace) nrow(x) else ceiling(.632*nrow(x)) only means that its default value is nrow(x) when replace is true and ceiling(.632*nrow(x)) otherwise. 参数sampsize = if (replace) nrow(x) else ceiling(.632*nrow(x))仅表示replace为true时默认值为 nrow(x) ,否则为ceiling(.632*nrow(x))

However, you can change as you wish by assigning a value to it: 但是,可以通过为其分配值来进行更改:

randomForest(X, replace = T, sampsize = 10)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 有没有办法根据 h2o.randomForest() 中的袋内样本(而不是袋外样本)获取训练评分历史? - Is there a way to get the training scoring history based on in-bag samples (instead of out-of-bag samples) in h2o.randomForest()? randomForest 袋外变量重要性 - randomForest out-of-bag variable importance RandomForest:“预测”组件的含义及其与袋外错误的关系 - RandomForest: Meaning of 'predicted' component and its relation to out-of-bag errors 随机森林中的袋外误差图 - Out-Of-Bag error plot in Random Forest 如何执行决策树的装袋并使用袋外估计获得准确度? - How to perform bagging of decision trees and get the accuracy using out-of-bag estimation? 森林随机R包中的袋外观察 - Out of bag observation in randomForest R-Package 如何将从R包randomForest创建的基础决策规则应用于新的Out of Bag测试集? - How do I apply underlying decision rules created from the R package randomForest onto a NEW Out of Bag test set? R,取出袋装样品以产生袋装样品 - R, removing the bagged samples to generate out of bag sample 我可以看到 R randomForest 包中回归任务的袋外错误吗? - Can I see the out of bag error for regression tasks in the R randomForest package? 如何在train()方法=“ treebag”中发现“出包错误” - How to find Out of bag error in train() method=“treebag”
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM