不一致的“最佳調諧”和“調整參數的重新采樣結果”插入符號R包

Question

我正在嘗試使用帶有調諧網格的Caret創建模型

svmGrid < - expand.grid（C = c（0.0001,0.001,0.01,0.1,1,10,20,30,40,50,100））

然后再次使用此網格的子集：

svmGrid < - expand.grid（C = c（0.0001,0.001,0.01,0.1,1,10,20,30,40,50））

問題是我得到了不同的“最佳調諧”和“跨調整參數重新采樣結果”，盡管為第一個調諧網格選擇的C參數值也出現在第二個調諧網格中。

在對sampling參數使用不同選項時以及在trainControl（）中使用不同的summaryFunction選項時，我也會遇到這些差異

不用說，由於每次都選擇不同的最佳模型，因此它會影響測試集上的預測結果。

任何人都知道為什么會這樣？

可重復的數據集：

library(caret)
library(doMC)
registerDoMC(cores = 8)

set.seed(2969)
imbal_train <- twoClassSim(100, intercept = -20, linearVars = 20)
imbal_test  <- twoClassSim(100, intercept = -20, linearVars = 20)
table(imbal_train$Class)

使用第一個調整網格運行

svmGrid <-  expand.grid(C = c(0.0001,0.001,0.01,0.1,1,10,20,30,40,50,100))

up_fitControl = trainControl(method = "cv", number = 10 , savePredictions = TRUE, allowParallel = TRUE, sampling = "up", seeds = NA)


set.seed(5627)
up_inside <- train(Class ~ ., data = imbal_train,
                   method = "svmLinear",
                   trControl = up_fitControl,
                   tuneGrid = svmGrid,
                   scale = FALSE)

up_inside

首次輸出：

> up_inside
Support Vector Machines with Linear Kernel 

100 samples
 25 predictors
  2 classes: 'Class1', 'Class2' 

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 90, 91, 90, 90, 89, 90, ... 
Addtional sampling using up-sampling

Resampling results across tuning parameters:

  C      Accuracy   Kappa         Accuracy SD  Kappa SD 
  1e-04  0.7734343   0.252201364  0.1227632    0.3198165
  1e-03  0.8225253   0.396439198  0.1245455    0.3626456
  1e-02  0.7595960   0.116150973  0.1431780    0.3046825
  1e-01  0.7686869   0.051430454  0.1167093    0.2712062
  1e+00  0.7695960  -0.004261294  0.1162279    0.2190151
  1e+01  0.7093939   0.111852756  0.2030250    0.3810059
  2e+01  0.7195960   0.040458804  0.1932690    0.2580560
  3e+01  0.7195960   0.040458804  0.1932690    0.2580560
  4e+01  0.7195960   0.040458804  0.1932690    0.2580560
  5e+01  0.7195960   0.040458804  0.1932690    0.2580560
  1e+02  0.7195960   0.040458804  0.1932690    0.2580560

Accuracy was used to select the optimal model using  the largest value.
The final value used for the model was C = 0.001.

使用第二個調整網格運行

svmGrid <-  expand.grid(C = c(0.0001,0.001,0.01,0.1,1,10,20,30,40,50))

up_fitControl = trainControl(method = "cv", number = 10 , savePredictions = TRUE, allowParallel = TRUE, sampling = "up", seeds = NA)


set.seed(5627)
up_inside <- train(Class ~ ., data = imbal_train,
                   method = "svmLinear",
                   trControl = up_fitControl,
                   tuneGrid = svmGrid,
                   scale = FALSE)

up_inside

第二輪輸出：

> up_inside
Support Vector Machines with Linear Kernel 

100 samples
 25 predictors
  2 classes: 'Class1', 'Class2' 

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 90, 91, 90, 90, 89, 90, ... 
Addtional sampling using up-sampling

Resampling results across tuning parameters:

  C      Accuracy   Kappa         Accuracy SD  Kappa SD 
  1e-04  0.8125253   0.392165694  0.13043060   0.3694786
  1e-03  0.8114141   0.375569633  0.12291273   0.3549978
  1e-02  0.7995960   0.205413345  0.06734882   0.2662161
  1e-01  0.7495960   0.017139266  0.09742161   0.2270128
  1e+00  0.7695960  -0.004261294  0.11622791   0.2190151
  1e+01  0.7093939   0.111852756  0.20302503   0.3810059
  2e+01  0.7195960   0.040458804  0.19326904   0.2580560
  3e+01  0.7195960   0.040458804  0.19326904   0.2580560
  4e+01  0.7195960   0.040458804  0.19326904   0.2580560
  5e+01  0.7195960   0.040458804  0.19326904   0.2580560

Accuracy was used to select the optimal model using  the largest value.
The final value used for the model was C = 1e-04.

Answer 1

如果你沒有在caret提供種子，它會為你選擇它們。 由於您的網格長度不同，因此折疊的種子會略有不同。

下面，我已經粘貼了示例，我剛剛重命名了第二個模型，因此比較的輸出更容易獲得：

> up_inside$control$seeds[[1]]
 [1] 825016 802597 128276 935565 324036 188187 284067  58853 923008 995461  60759
> up_inside2$control$seeds[[1]]
 [1] 825016 802597 128276 935565 324036 188187 284067  58853 923008 995461
> up_inside$control$seeds[[2]]
 [1] 966837 256990 592077 291736 615683 390075 967327 349693  73789 155739 916233
# See how the first seed here is the same as the last seed of the first model
> up_inside2$control$seeds[[2]]
 [1]  60759 966837 256990 592077 291736 615683 390075 967327 349693  73789

如果您現在繼續設置自己的種子，您將得到相同的輸出：

# Seeds for your first train
myseeds <- list(c(1:10,1000), c(11:20,2000), c(21:30, 3000),c(31:40, 4000),c(41:50, 5000),
                c(51:60, 6000),c(61:70, 7000),c(71:80, 8000),c(81:90, 9000),c(91:100, 10000), c(343))
# Seeds for your second train
myseeds2 <- list(c(1:10), c(11:20), c(21:30),c(31:40),c(41:50),c(51:60),
                 c(61:70),c(71:80),c(81:90),c(91:100), c(343))

> up_inside
Support Vector Machines with Linear Kernel 

100 samples
 25 predictor
  2 classes: 'Class1', 'Class2' 

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 90, 91, 90, 90, 89, 90, ... 
Addtional sampling using up-sampling

Resampling results across tuning parameters:

  C      Accuracy   Kappa      
  1e-04  0.7714141  0.239823027
  1e-03  0.7914141  0.332834590
  1e-02  0.7695960  0.207000745
  1e-01  0.7786869  0.103957926
  1e+00  0.7795960  0.006849817
  1e+01  0.7093939  0.111852756
  2e+01  0.7195960  0.040458804
  3e+01  0.7195960  0.040458804
  4e+01  0.7195960  0.040458804
  5e+01  0.7195960  0.040458804
  1e+02  0.7195960  0.040458804

Accuracy was used to select the optimal model using  the largest value.
The final value used for the model was C = 0.001. 
> up_inside2
Support Vector Machines with Linear Kernel 

100 samples
 25 predictor
  2 classes: 'Class1', 'Class2' 

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 90, 91, 90, 90, 89, 90, ... 
Addtional sampling using up-sampling

Resampling results across tuning parameters:

  C      Accuracy   Kappa      
  1e-04  0.7714141  0.239823027
  1e-03  0.7914141  0.332834590
  1e-02  0.7695960  0.207000745
  1e-01  0.7786869  0.103957926
  1e+00  0.7795960  0.006849817
  1e+01  0.7093939  0.111852756
  2e+01  0.7195960  0.040458804
  3e+01  0.7195960  0.040458804
  4e+01  0.7195960  0.040458804
  5e+01  0.7195960  0.040458804

Accuracy was used to select the optimal model using  the largest value.
The final value used for the model was C = 0.001.

不一致的“最佳調諧”和“調整參數的重新采樣結果”插入符號R包

問題描述

1 個解決方案

解決方案1
4 已采納 2016-07-05 13:29:43

不一致的“最佳調諧”和“調整參數的重新采樣結果”插入符號R包

問題描述

1 個解決方案

解決方案1 4 已采納 2016-07-05 13:29:43

解決方案1
4 已采納 2016-07-05 13:29:43