简体   繁体   English

r 中的随机森林 model

[英]Random forest model in r

Is there any way, where we can create multiple random forest models by fine-tuning hyper parameters on train data and check the test data performance against all models and store it in a csv file?有什么方法可以通过微调训练数据的超参数来创建多个随机森林模型,并针对所有模型检查测试数据的性能并将其存储在 csv 文件中?

For ex:- i have one model with mtry is 6, nodesize is 3, and another model where mtry is 10 and nodesize is 4 What i need to do is to test these two models performance on test data and store the key model metrics like confusion matrix, sensitivity, and specificity. For ex:- i have one model with mtry is 6, nodesize is 3, and another model where mtry is 10 and nodesize is 4 What i need to do is to test these two models performance on test data and store the key model metrics like混淆矩阵、敏感性和特异性。

i have tried the following code我试过下面的代码

train_performance <- data.frame('TN'=0,'FP'=0,'FN'=0,'TP'=0,'accuracy'=0,'kappa'=0,'sensitivity'=0,'specificity'=0)
modellist <- list()

for (mtry in c(6,11)){
  for (nodesize in c(2,3)){
    fit_model <- randomForest(dv~., train_final,mtry = mtry, importance=TRUE, nodesize=nodesize,
                                sampsize = ceiling(.8*nrow(train_final)), proximity=TRUE,na.action = na.omit,
                            ntree=500)
      Key_col <- paste0(mtry,"-",nodesize)
      modellist[[Key_col]] <- fit_model

      pred_train <- predict(fit_model, train_final)
      cf <- confusionMatrix(pred_train, train_final$DV, mode = 'everything', positive = '1')
      train_performance$TN <- cf$table[1]
      train_performance$FP <- cf$table[2]
      train_performance$FN <- cf$table[3]
      train_performance$TP <- cf$table[4]
      train_performance$accuracy=cf$overall[1]
      train_performance$kappa=cf$overall[2]
      train_performance$sensitivity=cf$byClass[1]
      train_performance$specificity=cf$byClass[2]
      train_performance$key=Key_col
    }
  }

Below is sample method using caret package on how to tune and train your random forest model which outputs accuracy parameters for all models:下面是使用caret package 关于如何调整和训练您的随机森林 model 的示例方法,该方法输出所有模型的精度参数:

library(randomForest)
library(mlbench)
library(caret)

# Load Dataset
data(Sonar)
dataset <- Sonar
x <- dataset[,1:60]
y <- dataset[,61]
# Create model with default paramters
control <- trainControl(method="repeatedcv", number=10, repeats=3)
seed <- 7
metric <- "Accuracy"
set.seed(seed)
mtry <- sqrt(ncol(x))
tunegrid <- expand.grid(.mtry=mtry)
rf_default <- train(Class~., data=dataset, method="rf", metric=metric, tuneGrid=tunegrid, trControl=control)
print(rf_default)

output: output:

Resampling results

  Accuracy   Kappa      Accuracy SD  Kappa SD 
  0.8138384  0.6209924  0.0747572    0.1569159

Tune Using Caret :使用Caret调谐:

Random Search : One search strategy that we can use is to try random values within a range.随机搜索:我们可以使用的一种搜索策略是尝试一个范围内的随机值。

# Random Search
control <- trainControl(method="repeatedcv", number=10, repeats=3, search="random")
set.seed(seed)
mtry <- sqrt(ncol(x))
rf_random <- train(Class~., data=dataset, method="rf", metric=metric, tuneLength=15, trControl=control)
print(rf_random)
plot(rf_random)

output: output:

Resampling results across tuning parameters:

  mtry  Accuracy   Kappa      Accuracy SD  Kappa SD 
  11    0.8218470  0.6365181  0.09124610   0.1906693
  14    0.8140620  0.6215867  0.08475785   0.1750848
  17    0.8030231  0.5990734  0.09595988   0.1986971
  24    0.8042929  0.6002362  0.09847815   0.2053314
  30    0.7933333  0.5798250  0.09110171   0.1879681
  34    0.8015873  0.5970248  0.07931664   0.1621170
  45    0.7932612  0.5796828  0.09195386   0.1887363
  47    0.7903896  0.5738230  0.10325010   0.2123314
  49    0.7867532  0.5673879  0.09256912   0.1899197
  50    0.7775397  0.5483207  0.10118502   0.2063198
  60    0.7790476  0.5513705  0.09810647   0.2005012

在此处输入图像描述

Grid Search : Another search is to define a grid of algorithm parameters to try.网格搜索:另一种搜索是定义一个算法参数网格来尝试。

control <- trainControl(method="repeatedcv", number=10, repeats=3, search="grid")
set.seed(seed)
tunegrid <- expand.grid(.mtry=c(1:15))
rf_gridsearch <- train(Class~., data=dataset, method="rf", metric=metric, tuneGrid=tunegrid, trControl=control)
print(rf_gridsearch)
plot(rf_gridsearch)

output: output:

Resampling results across tuning parameters:

  mtry  Accuracy   Kappa      Accuracy SD  Kappa SD 
   1    0.8377273  0.6688712  0.07154794   0.1507990
   2    0.8378932  0.6693593  0.07185686   0.1513988
   3    0.8314502  0.6564856  0.08191277   0.1700197
   4    0.8249567  0.6435956  0.07653933   0.1590840
   5    0.8268470  0.6472114  0.06787878   0.1418983
   6    0.8298701  0.6537667  0.07968069   0.1654484
   7    0.8282035  0.6493708  0.07492042   0.1584772
   8    0.8232828  0.6396484  0.07468091   0.1571185
   9    0.8268398  0.6476575  0.07355522   0.1529670
  10    0.8204906  0.6346991  0.08499469   0.1756645
  11    0.8073304  0.6071477  0.09882638   0.2055589
  12    0.8184488  0.6299098  0.09038264   0.1884499
  13    0.8093795  0.6119327  0.08788302   0.1821910
  14    0.8186797  0.6304113  0.08178957   0.1715189
  15    0.8168615  0.6265481  0.10074984   0.2091663

在此处输入图像描述

There are many other methods to tune your random forest model and store the results of these models, above two are the most widely used methods.还有许多其他方法可以调整您的随机森林 model 并存储这些模型的结果,以上两种是最广泛使用的方法。

Moreover, you can also manually set these parameters up and train and tune the model.此外,您还可以手动设置这些参数并训练和调整 model。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM