简体   繁体   English

将随机森林变成决策树-在R中使用randomForest包

[英]Turning a Random Forest into a Decision Tree - Using randomForest package in R

Is it possible to generate a decision forest whose trees are exactly the same? 是否可以生成决策树的树完全相同? Please note that this is an experimental question. 请注意,这是一个实验性问题。 As far as I understand random forests have two parameters that lead to the 'randomness' compared to a single decision tree: 据我了解,与单个决策树相比,随机森林具有两个导致“随机性”的参数:

1) number of features randomly sampled at each node of a decision tree, and 1)在决策树的每个节点上随机采样的要素数量,以及

2) number of training examples drawn to create a tree. 2)绘制一些训练示例来创建树。

Intuitively, if I set these two parameters to their maximum values, then I should be avoiding the 'randomness', hence each created tree should be exactly the same. 直观地讲,如果我将这两个参数设置为其最大值,则应避免使用“随机性”,因此,每个创建的树都应该完全相同。 Because all the trees would exactly be the same, I should be achieving the same results regardless the number of trees in the forest or different runs (ie different seed values). 因为所有树木都是完全相同的,所以无论森林中树木的数量或不同的行径(即不同的种子值),我都应该获得相同的结果。

I have tested this idea using the randomForest library within R. I think the two aforementioned parameters correspond to 'mtry' and 'sampsize' respectively. 我已经使用R中的randomForest库测试了这个想法。我认为上述两个参数分别对应于'mtry'和'sampsize'。 I have set these values to their maximum, but unfortunately there is still some randomness left, as the OOB-error estimates vary depending on the number of trees in the forest?! 我将这些值设置为最大值,但是不幸的是,仍然存在一些随机性,因为OOB误差估计值取决于森林中树木的数量?

Would you please help me understand how to remove all the randomness in a random decision forest, prefarably using the arguments of the randomForest library within R? 您能否帮助我理解如何充分地使用R中的randomForest库的参数来消除随机决策林中的所有随机性?

In addition to mtry and sampsize, there's another relevant argument in randomForest(): replace. 除了mtry和sampsize之外,randomForest()中还有另一个相关的参数:replace。 By default the sampling of data points to grow each tree is done with replacement. 默认情况下,通过替换来完成用于生长每棵树的数据点的采样。 If you want all data points to be used in all trees, not only you need to set sampsize to the number of data points, but also set replace=FALSE. 如果要在所有树中使用所有数据点,则不仅需要将sampsize设置为数据点的数量,还需要设置replace = FALSE。

Here's a toy example to show that you can get a forest of identical trees: 这是一个玩具示例,展示了您可以得到一棵相同树木的森林:

library(randomForest) 库(随机森林)

set.seed(17) set.seed(17)

x <- matrix(sample(5, 50, replace=TRUE), 10, 5) x <-矩阵(sample(5,50,replace = TRUE),10,5)

y <- factor(sample(2, 10, replace=TRUE)) y <-factor(sample(2,10,replace = TRUE))

rf1 <- randomForest(x, y, mtry=ncol(x), sampsize=nrow(x), replace=FALSE, ntree=5) rf1 <-randomForest(x,y,mtry = ncol(x),sampsize = nrow(x),replace = FALSE,ntree = 5)

You can then use getTree(rf1, 1), etc. to check that all trees are identical. 然后,可以使用getTree(rf1,1)等检查所有树是否相同。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在R中使用randomForest包,如何映射随机森林预测? - Using randomForest package in R, how to map Random forest prediction? 如何通过包“ randomForest”获取随机森林中每棵树的OOB错误 - How to get OOB error of each tree in a random forest, by package 'randomForest' 如何在 R 中从头开始创建随机森林(没有 randomforest 包) - How to create Random Forest from scratch in R (without the randomforest package) Tidymodel 包:R 中的通用线性模型 (glm) 和决策树(袋装树、提升树和随机森林)模型 - Tidymodel Package: General linear models (glm) and decision tree (bagged trees, boosted trees, and random forest) models in R 如何使用R在随机森林中生成决策树图和变量重要性图? - How do I generate a Decision Tree plot and a Variable Importance plot in Random Forest using R? 使用randomForest包在随机森林中分类输出 - Classification output in random forest with randomForest package R中的randomForest使用哪种决策树算法? - Which decision tree algorithm is used in randomForest in R? 如何绘制从使用R中的“插入符”包创建的随机森林中选择的树 - How can I plot a tree selected from the random forest created using “caret” package in R 如何创建与R randomForest相同的sklearn随机森林模型? - How to create sklearn random forest model identical to R randomForest? 使用R中的randomForest包进行预测 - Predict using randomForest package in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM