簡體   English   中英

r 中的隨機森林 model

[英]Random forest model in r

有什么方法可以通過微調訓練數據的超參數來創建多個隨機森林模型,並針對所有模型檢查測試數據的性能並將其存儲在 csv 文件中?

For ex:- i have one model with mtry is 6, nodesize is 3, and another model where mtry is 10 and nodesize is 4 What i need to do is to test these two models performance on test data and store the key model metrics like混淆矩陣、敏感性和特異性。

我試過下面的代碼

train_performance <- data.frame('TN'=0,'FP'=0,'FN'=0,'TP'=0,'accuracy'=0,'kappa'=0,'sensitivity'=0,'specificity'=0)
modellist <- list()

for (mtry in c(6,11)){
  for (nodesize in c(2,3)){
    fit_model <- randomForest(dv~., train_final,mtry = mtry, importance=TRUE, nodesize=nodesize,
                                sampsize = ceiling(.8*nrow(train_final)), proximity=TRUE,na.action = na.omit,
                            ntree=500)
      Key_col <- paste0(mtry,"-",nodesize)
      modellist[[Key_col]] <- fit_model

      pred_train <- predict(fit_model, train_final)
      cf <- confusionMatrix(pred_train, train_final$DV, mode = 'everything', positive = '1')
      train_performance$TN <- cf$table[1]
      train_performance$FP <- cf$table[2]
      train_performance$FN <- cf$table[3]
      train_performance$TP <- cf$table[4]
      train_performance$accuracy=cf$overall[1]
      train_performance$kappa=cf$overall[2]
      train_performance$sensitivity=cf$byClass[1]
      train_performance$specificity=cf$byClass[2]
      train_performance$key=Key_col
    }
  }

下面是使用caret package 關於如何調整和訓練您的隨機森林 model 的示例方法,該方法輸出所有模型的精度參數:

library(randomForest)
library(mlbench)
library(caret)

# Load Dataset
data(Sonar)
dataset <- Sonar
x <- dataset[,1:60]
y <- dataset[,61]
# Create model with default paramters
control <- trainControl(method="repeatedcv", number=10, repeats=3)
seed <- 7
metric <- "Accuracy"
set.seed(seed)
mtry <- sqrt(ncol(x))
tunegrid <- expand.grid(.mtry=mtry)
rf_default <- train(Class~., data=dataset, method="rf", metric=metric, tuneGrid=tunegrid, trControl=control)
print(rf_default)

output:

Resampling results

  Accuracy   Kappa      Accuracy SD  Kappa SD 
  0.8138384  0.6209924  0.0747572    0.1569159

使用Caret調諧:

隨機搜索:我們可以使用的一種搜索策略是嘗試一個范圍內的隨機值。

# Random Search
control <- trainControl(method="repeatedcv", number=10, repeats=3, search="random")
set.seed(seed)
mtry <- sqrt(ncol(x))
rf_random <- train(Class~., data=dataset, method="rf", metric=metric, tuneLength=15, trControl=control)
print(rf_random)
plot(rf_random)

output:

Resampling results across tuning parameters:

  mtry  Accuracy   Kappa      Accuracy SD  Kappa SD 
  11    0.8218470  0.6365181  0.09124610   0.1906693
  14    0.8140620  0.6215867  0.08475785   0.1750848
  17    0.8030231  0.5990734  0.09595988   0.1986971
  24    0.8042929  0.6002362  0.09847815   0.2053314
  30    0.7933333  0.5798250  0.09110171   0.1879681
  34    0.8015873  0.5970248  0.07931664   0.1621170
  45    0.7932612  0.5796828  0.09195386   0.1887363
  47    0.7903896  0.5738230  0.10325010   0.2123314
  49    0.7867532  0.5673879  0.09256912   0.1899197
  50    0.7775397  0.5483207  0.10118502   0.2063198
  60    0.7790476  0.5513705  0.09810647   0.2005012

在此處輸入圖像描述

網格搜索:另一種搜索是定義一個算法參數網格來嘗試。

control <- trainControl(method="repeatedcv", number=10, repeats=3, search="grid")
set.seed(seed)
tunegrid <- expand.grid(.mtry=c(1:15))
rf_gridsearch <- train(Class~., data=dataset, method="rf", metric=metric, tuneGrid=tunegrid, trControl=control)
print(rf_gridsearch)
plot(rf_gridsearch)

output:

Resampling results across tuning parameters:

  mtry  Accuracy   Kappa      Accuracy SD  Kappa SD 
   1    0.8377273  0.6688712  0.07154794   0.1507990
   2    0.8378932  0.6693593  0.07185686   0.1513988
   3    0.8314502  0.6564856  0.08191277   0.1700197
   4    0.8249567  0.6435956  0.07653933   0.1590840
   5    0.8268470  0.6472114  0.06787878   0.1418983
   6    0.8298701  0.6537667  0.07968069   0.1654484
   7    0.8282035  0.6493708  0.07492042   0.1584772
   8    0.8232828  0.6396484  0.07468091   0.1571185
   9    0.8268398  0.6476575  0.07355522   0.1529670
  10    0.8204906  0.6346991  0.08499469   0.1756645
  11    0.8073304  0.6071477  0.09882638   0.2055589
  12    0.8184488  0.6299098  0.09038264   0.1884499
  13    0.8093795  0.6119327  0.08788302   0.1821910
  14    0.8186797  0.6304113  0.08178957   0.1715189
  15    0.8168615  0.6265481  0.10074984   0.2091663

在此處輸入圖像描述

還有許多其他方法可以調整您的隨機森林 model 並存儲這些模型的結果,以上兩種是最廣泛使用的方法。

此外,您還可以手動設置這些參數並訓練和調整 model。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM