简体   繁体   中英

R caret: Tuning GLM boost prune parameter

I'm trying to tune the parameters for a GLM boost model. According to the Caret package documentation concerning this model there is 2 parameters that can be adjusted, mstop and prune.

    library(caret)
    library(mlbench)

    data(Sonar)

    set.seed(25)
    trainIndex = createDataPartition(Sonar$Class, p = 0.9, list = FALSE)
    training = Sonar[ trainIndex,]
    testing  = Sonar[-trainIndex,]

    ### set training parameters
    fitControl = trainControl(method = "repeatedcv",
                              number = 10,
                              repeats = 10,
                              ## Estimate class probabilities
                              classProbs = TRUE,
                              ## Evaluate a two-class performances  
                              ## (ROC, sensitivity, specificity) using the following function 
                              summaryFunction = twoClassSummary)

    ### train the models
    set.seed(69)
    # Use the expand.grid to specify the search space   
    glmBoostGrid = expand.grid(mstop = c(50, 100, 150, 200, 250, 300),
                               prune = c('yes', 'no'))

    glmBoostFit = train(Class ~ ., 
                        data = training,
                        method = "glmboost",
                        trControl = fitControl,
                        tuneGrid = glmBoostGrid,
                        metric = 'ROC')
glmBoostFit

The output is the following:

Boosted Generalized Linear Model 

188 samples
 60 predictors
  2 classes: 'M', 'R' 

No pre-processing
Resampling: Cross-Validated (10 fold, repeated 10 times) 
Summary of sample sizes: 169, 169, 169, 169, 170, 169, ... 
Resampling results across tuning parameters:

  mstop  ROC        Sens   Spec       ROC SD      Sens SD    Spec SD  
   50    0.8261806  0.764  0.7598611  0.10208114  0.1311104  0.1539477
  100    0.8265972  0.729  0.7625000  0.09459835  0.1391250  0.1385465
  150    0.8282083  0.717  0.7726389  0.09570417  0.1418152  0.1382405
  200    0.8307917  0.714  0.7769444  0.09484042  0.1439011  0.1452857
  250    0.8306667  0.719  0.7756944  0.09452604  0.1436740  0.1535578
  300    0.8278403  0.728  0.7722222  0.09794868  0.1425398  0.1576030

Tuning parameter 'prune' was held constant at a value of yes
ROC was used to select the optimal model using  the largest value.
The final values used for the model were mstop = 200 and prune = yes. 

The prune parameter is kept constant ( Tuning parameter 'prune' was held constant at a value of yes ) although the glmBoostGrid contains also prune == no . I took a look at the mboost package documentation at the boost_control method and only the mstop parameter is accessible, so how can the prune parameter be tuned with the tuneGrid parameter of the train method?

The difference is loceted in this part of the calls for glmboost:

if (param$prune == "yes") {
    out <- if (is.factor(y)) 
        out[mstop(AIC(out, "classical"))]
    else out[mstop(AIC(out))]
}

The difference lies in how the aic is calculated. But running diverse tests with glmboost in caret I have my doubts if it is behaving as expected. I have created an issue in github to see if my suspicions are correct. I'll edit my answer if there is more information from the developers.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM