簡體   English   中英

如何使用插入符號比較不同的模型,調整不同的參數?

[英]How to compare different models using caret, tuning different parameters?

我正在嘗試實現一些功能,以比較五個不同的機器學習模型,以預測回歸問題中的一些值。

我的意圖是開發一套可以訓練不同代碼並將其組織成一套結果的功能。 我通過實例選擇的模型是:套索,隨機森林,支持向量機,線性模型和神經網絡。 為了調整某些模型,我打算使用Max Kuhn的引用: https : //topepo.github.io/caret/available-models.html 但是,由於每個模型都需要不同的調整參數,所以我不確定如何設置它們:

首先,我將網格設置為“ nnet”模型調整。 在這里,我選擇了隱藏層中不同數量的節點和衰減系數:

my.grid <- expand.grid(size=seq(from = 1, to = 10, by = 1), decay = seq(from = 0.1, to = 0.5, by = 0.1))

然后,我構建了以6折配置將5個模型運行5次的函數:

 my_list_model <- function(model) {
  set.seed(1)
  train.control <- trainControl(method = "repeatedcv", 
         number = 6,
         repeats =  5,
         returnResamp = "all",
         savePredictions = "all")

# The tunning configurations of machine learning models:
  set.seed(1)
  fit_m <- train(ST1 ~., 
         data = train, # my original dataframe, not showed in this code
         method = model, 
         metric = "RMSE", 
         preProcess = "scale", 
         trControl = train.control
         linout = 1        #  linear activation function output
         trace = FALSE
         maxit = 1000
         tuneGrid = my.grid) # Here is how I call the tune of 'nnet' parameters

 return(fit_m)
 } 

最后,我執行五個模型:

lapply(list(
Lass = "lasso", 
RF = "rf", 
SVM = "svmLinear",
OLS = "lm", 
NN = "nnet"), 
my_list_model) -> model_list

但是,當我運行它時,它顯示:

錯誤:調整參數網格不應包含列分數

據我了解,我不知道如何很好地指定調音參數。 如果我嘗試拋棄'nnet'模型並將其更改為倒數第二行,例如,將其更改為XGBoost模型,則看起來效果很好,並且可以計算出結果。 也就是說,似乎問題在於“ nnet”調整參數。

然后,我認為我真正的問題是:如何配置這些不同的模型參數,特別是“ nnet”模型。 另外,由於我不需要設置套索,隨機森林,svmLinear和線性模型的參數,因此,如何通過插入符號包對其進行調整?

my_list_model <- function(model,grd=NULL){
  train.control <- trainControl(method = "repeatedcv", 
                            number = 6,
                            returnResamp = "all",
                            savePredictions = "all")

 # The tuning configurations of machine learning models:
 set.seed(1)
 fit_m <- train(Y ~., 
             data = df, # my original dataframe, not showed in this code
             method = model, 
             metric = "RMSE", 
             preProcess = "scale", 
             trControl = train.control,
             linout = 1,        #  linear activation function output
             trace = FALSE,
             maxit = 1000,
             tuneGrid = grd) # Here is how I call the tune of 'nnet' parameters
 return(fit_m)
 }

首先運行下面的代碼,並查看所有相關參數

modelLookup('rf')

現在基於上述查找代碼對所有模型進行網格化

svmGrid <-  expand.grid(C=c(3,2,1))
rfGrid <-  expand.grid(mtry=c(5,10,15))

創建所有模型網格的列表,並確保模型名稱與列表中的名稱相同

grd_all<-list(svmLinear=svmGrid
          ,rf=rfGrid)
model_list<-lapply(c("rf","svmLinear")
               ,function(x){my_list_model(x,grd_all[[x]])})
model_list

[[1]]
Random Forest 

17 samples
3 predictor

Pre-processing: scaled (3) 
Resampling: Cross-Validated (6 fold, repeated 1 times) 
Summary of sample sizes: 14, 14, 15, 14, 14, 14, ... 
Resampling results across tuning parameters:

mtry  RMSE      Rsquared   MAE     
 5    63.54864  0.5247415  55.72074
10    63.70247  0.5255311  55.35263
15    62.13805  0.5765130  54.53411

RMSE was used to select the optimal model using the smallest value.
The final value used for the model was mtry = 15.

[[2]]
Support Vector Machines with Linear Kernel 

17 samples
3 predictor

Pre-processing: scaled (3) 
Resampling: Cross-Validated (6 fold, repeated 1 times) 
Summary of sample sizes: 14, 14, 15, 14, 14, 14, ... 
Resampling results across tuning parameters:

C  RMSE      Rsquared   MAE     
1  59.83309  0.5879396  52.26890
2  66.45247  0.5621379  58.74603
3  67.28742  0.5576000  59.55334

RMSE was used to select the optimal model using the smallest value.
The final value used for the model was C = 1.

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM