[英]Random forest model in r
有什么方法可以通過微調訓練數據的超參數來創建多個隨機森林模型,並針對所有模型檢查測試數據的性能並將其存儲在 csv 文件中?
For ex:- i have one model with mtry
is 6, nodesize
is 3, and another model where mtry
is 10 and nodesize
is 4 What i need to do is to test these two models performance on test data and store the key model metrics like混淆矩陣、敏感性和特異性。
我試過下面的代碼
train_performance <- data.frame('TN'=0,'FP'=0,'FN'=0,'TP'=0,'accuracy'=0,'kappa'=0,'sensitivity'=0,'specificity'=0)
modellist <- list()
for (mtry in c(6,11)){
for (nodesize in c(2,3)){
fit_model <- randomForest(dv~., train_final,mtry = mtry, importance=TRUE, nodesize=nodesize,
sampsize = ceiling(.8*nrow(train_final)), proximity=TRUE,na.action = na.omit,
ntree=500)
Key_col <- paste0(mtry,"-",nodesize)
modellist[[Key_col]] <- fit_model
pred_train <- predict(fit_model, train_final)
cf <- confusionMatrix(pred_train, train_final$DV, mode = 'everything', positive = '1')
train_performance$TN <- cf$table[1]
train_performance$FP <- cf$table[2]
train_performance$FN <- cf$table[3]
train_performance$TP <- cf$table[4]
train_performance$accuracy=cf$overall[1]
train_performance$kappa=cf$overall[2]
train_performance$sensitivity=cf$byClass[1]
train_performance$specificity=cf$byClass[2]
train_performance$key=Key_col
}
}
下面是使用caret
package 關於如何調整和訓練您的隨機森林 model 的示例方法,該方法輸出所有模型的精度參數:
library(randomForest)
library(mlbench)
library(caret)
# Load Dataset
data(Sonar)
dataset <- Sonar
x <- dataset[,1:60]
y <- dataset[,61]
# Create model with default paramters
control <- trainControl(method="repeatedcv", number=10, repeats=3)
seed <- 7
metric <- "Accuracy"
set.seed(seed)
mtry <- sqrt(ncol(x))
tunegrid <- expand.grid(.mtry=mtry)
rf_default <- train(Class~., data=dataset, method="rf", metric=metric, tuneGrid=tunegrid, trControl=control)
print(rf_default)
output:
Resampling results
Accuracy Kappa Accuracy SD Kappa SD
0.8138384 0.6209924 0.0747572 0.1569159
使用Caret
調諧:
隨機搜索:我們可以使用的一種搜索策略是嘗試一個范圍內的隨機值。
# Random Search
control <- trainControl(method="repeatedcv", number=10, repeats=3, search="random")
set.seed(seed)
mtry <- sqrt(ncol(x))
rf_random <- train(Class~., data=dataset, method="rf", metric=metric, tuneLength=15, trControl=control)
print(rf_random)
plot(rf_random)
output:
Resampling results across tuning parameters:
mtry Accuracy Kappa Accuracy SD Kappa SD
11 0.8218470 0.6365181 0.09124610 0.1906693
14 0.8140620 0.6215867 0.08475785 0.1750848
17 0.8030231 0.5990734 0.09595988 0.1986971
24 0.8042929 0.6002362 0.09847815 0.2053314
30 0.7933333 0.5798250 0.09110171 0.1879681
34 0.8015873 0.5970248 0.07931664 0.1621170
45 0.7932612 0.5796828 0.09195386 0.1887363
47 0.7903896 0.5738230 0.10325010 0.2123314
49 0.7867532 0.5673879 0.09256912 0.1899197
50 0.7775397 0.5483207 0.10118502 0.2063198
60 0.7790476 0.5513705 0.09810647 0.2005012
網格搜索:另一種搜索是定義一個算法參數網格來嘗試。
control <- trainControl(method="repeatedcv", number=10, repeats=3, search="grid")
set.seed(seed)
tunegrid <- expand.grid(.mtry=c(1:15))
rf_gridsearch <- train(Class~., data=dataset, method="rf", metric=metric, tuneGrid=tunegrid, trControl=control)
print(rf_gridsearch)
plot(rf_gridsearch)
output:
Resampling results across tuning parameters:
mtry Accuracy Kappa Accuracy SD Kappa SD
1 0.8377273 0.6688712 0.07154794 0.1507990
2 0.8378932 0.6693593 0.07185686 0.1513988
3 0.8314502 0.6564856 0.08191277 0.1700197
4 0.8249567 0.6435956 0.07653933 0.1590840
5 0.8268470 0.6472114 0.06787878 0.1418983
6 0.8298701 0.6537667 0.07968069 0.1654484
7 0.8282035 0.6493708 0.07492042 0.1584772
8 0.8232828 0.6396484 0.07468091 0.1571185
9 0.8268398 0.6476575 0.07355522 0.1529670
10 0.8204906 0.6346991 0.08499469 0.1756645
11 0.8073304 0.6071477 0.09882638 0.2055589
12 0.8184488 0.6299098 0.09038264 0.1884499
13 0.8093795 0.6119327 0.08788302 0.1821910
14 0.8186797 0.6304113 0.08178957 0.1715189
15 0.8168615 0.6265481 0.10074984 0.2091663
還有許多其他方法可以調整您的隨機森林 model 並存儲這些模型的結果,以上兩種是最廣泛使用的方法。
此外,您還可以手動設置這些參數並訓練和調整 model。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.