R Caret在模型調整中的結果不一致

Question

今天，使用插入符號包進行模型調整時，我遇到了一個奇怪的問題：給定調整參數T *的特定組合，如果單獨評估T *或將其作為網格的一部分，則與T *相關的度量（即Cohen K）值就會改變可能的組合。 在下面的實際示例中，將插入符號用於與gbm軟件包進行交互。

# Load libraries and data
library (caret)
data<-read.csv("mydata.csv")
data$target<-as.factor(data$target)
# data are available at https://www.dropbox.com/s/1bglmqd14g840j1/mydata.csv?dl=0

方法1：單獨評估T *

#Define 5-fold cv as validation settings
fitControl <- trainControl(method = "cv",number = 5)

# Define the combination of tuning parameter for this example T*
gbmGrid <- expand.grid(.interaction.depth = 1,
                   .n.trees = 1000,
                   .shrinkage = 0.1, .n.minobsinnode=1)

# Fit a gbm with T* as model parameters and K as scoring metric.
set.seed(825)
gbmFit1 <- train(target ~ ., data = data,
             method = "gbm",
             distribution="adaboost",
             trControl = fitControl,
             tuneGrid=gbmGrid,
             verbose=F,
             metric="Kappa")

# The results show that T* is associated with Kappa = 0.47. Remember this result and the confusion matrix.
testPred<-predict(gbmFit1, newdata = data)
confusionMatrix(testPred, data$target) 
# output selection
Confusion Matrix and Statistics
           Reference
Prediction   0   1
         0 832  34
         1   0  16

Kappa : 0.4703

步驟2：連同其他調整配置文件一起評估T *

除了考慮調整參數{T}的幾種組合外，這里的一切與過程1相同：

# Notice that the original T* is included in {T}!!
gbmGrid2 <- expand.grid(.interaction.depth = 1,
                   .n.trees = seq(100,1000,by=100),
                   .shrinkage = 0.1, .n.minobsinnode=1)
# Fit the gbm
set.seed(825)
gbmFit2 <- train(target ~ ., data = data,
             method = "gbm",
             distribution="adaboost",
             trControl = fitControl,
             tuneGrid=gbmGrid2,
             verbose=F,
             metric="Kappa")

# Caret should pick the model with the highest Kappa. 
# Since T* is in {T} I would expect the best model to have K >= 0.47
testPred<-predict(gbmFit2, newdata = data)
confusionMatrix(testPred, data$target) 
# output selection
          Reference
Prediction   0   1
         0 831  47
         1   1   3

Kappa : 0.1036

結果與我的預期不一致：{T}中的最佳模型得分K = 0.10。 假設T *的K = 0.47並包含在{T}中，怎么可能？ 此外，根據下圖，在步驟2中評估的T *的K現在約為0.01。 關於發生了什么的任何想法？ 我想念什么嗎？

Answer 1

我從您的數據和代碼中獲得了一致的重采樣結果。

第一個模型的Kappa = 0.00943

gbmFit1$results
  interaction.depth n.trees shrinkage n.minobsinnode  Accuracy       Kappa  AccuracySD
1                 1    1000       0.1              1 0.9331022 0.009430576     0.004819004
    KappaSD
1 0.0589132

對於n.trees = 1000 ，第二個模型具有相同的結果

gbmFit2$results
   shrinkage interaction.depth n.minobsinnode n.trees  Accuracy        Kappa  AccuracySD
1        0.1                 1              1     100 0.9421803 -0.002075765 0.002422952
2        0.1                 1              1     200 0.9387776 -0.008326896 0.002468351
3        0.1                 1              1     300 0.9365049 -0.012187900 0.002625886
4        0.1                 1              1     400 0.9353749 -0.013950906 0.003077431
5        0.1                 1              1     500 0.9353685 -0.013961221 0.003244201
6        0.1                 1              1     600 0.9342322 -0.015486214 0.005202656
7        0.1                 1              1     700 0.9319658 -0.018574633 0.007033402
8        0.1                 1              1     800 0.9319658 -0.018574633 0.007033402
9        0.1                 1              1     900 0.9342386  0.010955568 0.003144850
10       0.1                 1              1    1000 0.9331022  0.009430576 0.004819004
       KappaSD
1  0.004641553
2  0.004654972
3  0.003978702
4  0.004837097
5  0.004878259
6  0.007469843
7  0.009470466
8  0.009470466
9  0.057825336
10 0.058913202

請注意，第二次運行的最佳模型的n.trees = 900

gbmFit2$bestTune
     n.trees interaction.depth shrinkage n.minobsinnode
9     900                 1       0.1              1

由於train根據您的指標選擇了“最佳”模型，因此您的第二個預測是使用其他模型（n.trees為900而不是1000）。

R Caret在模型調整中的結果不一致

問題描述

1 個解決方案

解決方案1
2 已采納 2015-09-20 21:42:55

R Caret在模型調整中的結果不一致

問題描述

1 個解決方案

解決方案1 2 已采納 2015-09-20 21:42:55

解決方案1
2 已采納 2015-09-20 21:42:55