[英]randomForest: how to control the proportion of in-bag / out-of-bag samples?
I'm using R randomForest
for various regression tasks. 我正在将R
randomForest
用于各种回归任务。 The hyperparameter tuning is still mysterious to me. 对我来说,超参数调整仍然很神秘。 I've got a handle on tuning
ntree
and mtry
, but it makes intuitive sense that I'd want to also tune the number of samples in each bag as an additional balance of model bias & variance. 我有调优
ntree
和mtry
,但是从直觉上来说,我还想调整每个包中的样本数量,以平衡模型偏差和方差。
Based on the documentation, I thought that this is what sampsize
does. 根据文档,我认为这就是
sampsize
所做的。 But reading the function arguments reveals that it's more complicated than that. 但是阅读函数参数会发现它比这更复杂。 If I run with replacement (
replace = TRUE
), it seems I have no control over the proportion of in-bag / out-of-bag samples. 如果我运行replacement(
replace = TRUE
),似乎我无法控制袋内/袋外样品的比例。 In fact, with replace = TRUE
, I don't think the proportion that the algorithm uses is even documented. 实际上,使用
replace = TRUE
,我什至认为该算法使用的比例甚至没有记载。
Documentation : sampsize: Size(s) of sample to draw.
说明文件 :
sampsize: Size(s) of sample to draw.
Function arguments : sampsize = if (replace) nrow(x) else ceiling(.632*nrow(x))
函数参数 :
sampsize = if (replace) nrow(x) else ceiling(.632*nrow(x))
Is there a way to control the proportion of in-bag-samples? 有没有办法控制袋中样品的比例? Is this even a worthwhile tuning parameter?
这甚至是一个值得调整的参数吗?
The parameter sampsize = if (replace) nrow(x) else ceiling(.632*nrow(x))
only means that its default value is nrow(x)
when replace is true and ceiling(.632*nrow(x))
otherwise. 参数
sampsize = if (replace) nrow(x) else ceiling(.632*nrow(x))
仅表示replace为true时默认值为 nrow(x)
,否则为ceiling(.632*nrow(x))
。
However, you can change as you wish by assigning a value to it: 但是,可以通过为其分配值来进行更改:
randomForest(X, replace = T, sampsize = 10)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.