繁体 English 中英

在随机森林回归中克服多重共线性，并将所有变量保留在模型中

[英]Overcoming Multicollinearity in Random Forest Regression and still keeping all variables in the model

原文 2016-09-16 17:35:01 2 1 r/ correlation/ random-forest

我是随机森林回归的新手。 我在prep1中有300个连续变量（299个预测变量和1个目标变量），其中一些预测变量是高度相关的。 问题是我仍然需要获取每个预测变量的重要性值，因此消除某些预测变量不是一种选择。

这是我的问题：

1）有没有一种方法可以为每棵树选择仅高度不相关的变量，如果是，则应如何调整以下代码？

2）假设1）是，这将解决多重共线性问题吗？

  bound <- floor(nrow(prep1)/2)         
  df <- prep1[sample(nrow(prep1)), ]            
  train <- df[1:bound, ]             
  test <- df[(bound+1):nrow(df), ]    
  modelFit <- randomForest(continuous_target ~., data = train)
  prediction <- predict(modelFit, test)

1 个解决方案

随机森林具有选择要替换的样本以及在这些样本上随机选择特征子集的性质。 根据您的情况，鉴于响应变量中没有偏斜，因此构建大树数应使您对所有变量都具有重要性。 尽管这会增加计算复杂性，因为您要为每个袋子多次捕获相同的重要性。 同样，多重共线性不会影响预测能力。

在R Plot随机森林模型的重要性变量

[英]in R Plot importance variables of Random Forest model

在 R 中将条件变量添加到随机森林 model

[英]Add conditioning variables to a random forest model in R

如何计算 R 中随机森林回归模型的置信度

[英]how to calculate the confidence level for random forest regression model in R

R：随机森林回归 model 中的错误训练数据

[英]R: Error training data in random forest regression model

为随机森林回归模型设置 ntree 和 mtry 的值

[英]setting values for ntree and mtry for random forest regression model

在进行回归分析时如何评估 model 和随机森林预测？

[英]How to assess the model and prediction of random forest when doing regression analysis?

随机森林对不平衡数据的回归

[英]regression with random forest on imbalanced data

随机森林回归输出计算

[英]random forest regression output calculation

随机森林回归-累积MSE？

[英]Random forest regression - cumulative MSE?

随机森林与逻辑回归

[英]Random Forest vs Logistic Regression

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在R Plot随机森林模型的重要性变量在 R 中将条件变量添加到随机森林 model 如何计算 R 中随机森林回归模型的置信度 R：随机森林回归 model 中的错误训练数据为随机森林回归模型设置 ntree 和 mtry 的值在进行回归分析时如何评估 model 和随机森林预测？随机森林对不平衡数据的回归随机森林回归输出计算随机森林回归-累积MSE？随机森林与逻辑回归

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM