如何在 mlr3 中重复 glmnet 的超参数调整（alpha 和/或 lambda）

Question

I would like to repeat the hyperparameter tuning ( alpha and/or lambda ) of glmnet in mlr3 to avoid variability in smaller data sets我想在glmnet中mlr3 glmnet 的超参数调整（ alpha和/或lambda ）以避免较小数据集的可变性

In caret , I could do this with "repeatedcv"在caret中，我可以用"repeatedcv"来做到这一点

Since I really like the mlr3 family packages I would like to use them for my analysis.因为我真的很喜欢mlr3系列软件包，所以我想将它们用于我的分析。 However, I am not sure about the correct way how to do this step in mlr3但是，我不确定如何在mlr3中执行此步骤的正确方法

Example data示例数据

#library
library(caret)
library(mlr3verse)
library(mlbench)

# get example data
data(PimaIndiansDiabetes, package="mlbench")
data <- PimaIndiansDiabetes

# get small training data
train.data <- data[1:60,]

^{Created on 2021-03-18 by the reprex package (v1.0.0)}^{由reprex package (v1.0.0) 于 2021 年 3 月 18 日创建}

caret approach (tuning alpha and lambda ) using "cv" and "repeatedcv" caret方法（调整alpha和lambda ）使用"cv"和"repeatedcv"


trControlCv <- trainControl("cv",
             number = 5,
             classProbs = TRUE,
             savePredictions = TRUE,
             summaryFunction = twoClassSummary)

# use "repeatedcv" to avoid variability in smaller data sets
trControlRCv <- trainControl("repeatedcv",
             number = 5,
             repeats= 20,
             classProbs = TRUE,
             savePredictions = TRUE,
             summaryFunction = twoClassSummary)

# train and extract coefficients with "cv" and different set.seed
set.seed(2323)
model <- train(
  diabetes ~., data = train.data, method = "glmnet",
  trControl = trControlCv,
  tuneLength = 10,
  metric="ROC"
)

coef(model$finalModel, model$finalModel$lambdaOpt) -> coef1

set.seed(23)
model <- train(
  diabetes ~., data = train.data, method = "glmnet",
  trControl = trControlCv,
  tuneLength = 10,
  metric="ROC"
)

coef(model$finalModel, model$finalModel$lambdaOpt) -> coef2


# train and extract coefficients with "repeatedcv" and different set.seed
set.seed(13)

model <- train(
  diabetes ~., data = train.data, method = "glmnet",
  trControl = trControlRCv,
  tuneLength = 10,
  metric="ROC"
)

coef(model$finalModel, model$finalModel$lambdaOpt) -> coef3


set.seed(55)
model <- train(
  diabetes ~., data = train.data, method = "glmnet",
  trControl = trControlRCv,
  tuneLength = 10,
  metric="ROC"
)

coef(model$finalModel, model$finalModel$lambdaOpt) -> coef4

^{Created on 2021-03-18 by the reprex package (v1.0.0)}^{由reprex package (v1.0.0) 于 2021 年 3 月 18 日创建}

Demonstrate different coefficients with cross-validation and same coefficients with repeated cross-validation用交叉验证展示不同的系数，用重复的交叉验证展示相同的系数

# with "cv" I get different coefficients
identical(coef1, coef2)
#> [1] FALSE

# with "repeatedcv" I get the same coefficients
identical(coef3,coef4)
#> [1] TRUE

^{Created on 2021-03-18 by the reprex package (v1.0.0)}^{由reprex package (v1.0.0) 于 2021 年 3 月 18 日创建}

FIRST mlr3 approach using cv.glmnet (does internally tune lambda )使用cv.glmnet的第一个mlr3方法（内部调整lambda ）

# create elastic net regression
glmnet_lrn = lrn("classif.cv_glmnet", predict_type = "prob")

# define train task
train.task <- TaskClassif$new("train.data", train.data, target = "diabetes")

# create learner 
learner = as_learner(glmnet_lrn)

# train the learner with different set.seed
set.seed(2323)
learner$train(train.task)
coef(learner$model, s = "lambda.min") -> coef1

set.seed(23)
learner$train(train.task)
coef(learner$model, s = "lambda.min") -> coef2

^{Created on 2021-03-18 by the reprex package (v1.0.0)}^{由reprex package (v1.0.0) 于 2021 年 3 月 18 日创建}

Demonstrate different coefficients with cross-validation通过交叉验证展示不同的系数

# compare coefficients
coef1
#> 9 x 1 sparse Matrix of class "dgCMatrix"
#>                        1
#> (Intercept) -3.323460895
#> age          0.005065928
#> glucose      0.019727881
#> insulin      .          
#> mass         .          
#> pedigree     .          
#> pregnant     0.001290570
#> pressure     .          
#> triceps      0.020529162
coef2
#> 9 x 1 sparse Matrix of class "dgCMatrix"
#>                        1
#> (Intercept) -3.146190752
#> age          0.003840963
#> glucose      0.019015433
#> insulin      .          
#> mass         .          
#> pedigree     .          
#> pregnant     .          
#> pressure     .          
#> triceps      0.018841557

^{Created on 2021-03-18 by the reprex package (v1.0.0)}^{由reprex package (v1.0.0) 于 2021 年 3 月 18 日创建}

Update 1: the progress I made更新 1：我取得的进展

According to the comment below and this comment I could use rsmp and AutoTuner根据下面的评论和这条评论，我可以使用rsmp和AutoTuner

This answer suggests not to tune cv.glmnet but glmnet (which was not available in ml3 at that time)这个答案建议不要调整cv.glmnet而是glmnet （当时在 ml3 中不可用）

SECOND mlr3 approach using glmnet (repeats the tuning of alpha and lambda )使用glmnet的第二种mlr3方法（重复调整alpha和lambda ）

# define train task
train.task <- TaskClassif$new("train.data", train.data, target = "diabetes")

# create elastic net regression
glmnet_lrn = lrn("classif.glmnet", predict_type = "prob")

# turn to learner
learner = as_learner(glmnet_lrn)

# make search space
search_space = ps(
  alpha = p_dbl(lower = 0, upper = 1),
  s = p_dbl(lower = 1, upper = 1)
)

# set terminator
terminator = trm("evals", n_evals = 20)

#set tuner
tuner = tnr("grid_search", resolution = 3)

# tune the learner
at = AutoTuner$new(
  learner = learner,
  rsmp("repeated_cv"),
  measure = msr("classif.ce"),
  search_space = search_space,
  terminator = terminator,
  tuner=tuner)

at
#> <AutoTuner:classif.glmnet.tuned>
#> * Model: -
#> * Parameters: list()
#> * Packages: glmnet
#> * Predict Type: prob
#> * Feature types: logical, integer, numeric
#> * Properties: multiclass, twoclass, weights

Open Question开放式问题

How can I demonstrate that my second approach is valid and that I get same or similar coefficients with different seeds?我如何证明我的第二种方法是有效的，并且我得到不同种子的相同或相似系数？ ie. IE。 how can I extract the coefficients for the final model of the AutoTuner如何提取AutoTuner的最终 model 的系数

set.seed(23)
at$train(train.task) -> tune1

set.seed(2323) 
at$train(train.task) -> tune2

^{Created on 2021-03-18 by the reprex package (v1.0.0)}^{由reprex package (v1.0.0) 于 2021 年 3 月 18 日创建}

Answer 1

Repeated hyperparameter tuning (alpha and lambda) of glmnet can be done using the SECOND mlr3 approach as stated above. glmnet 的重复超参数调整（alpha 和 lambda）可以使用SECOND mlr3方法glmnet ，如上所述。 The coefficients can be extracted with stats::coef and the stored values in the AutoTuner可以使用stats::coef和 A AutoTuner中存储的值提取系数

coef(tune1$model$learner$model, alpha=tune1$tuning_result$alpha,s=tune1$tuning_result$s)
# 9 x 1 sparse Matrix of class "dgCMatrix"
# 1
# (Intercept) -1.6359082102
# age          0.0075541841
# glucose      0.0044351365
# insulin      0.0005821515
# mass         0.0077104934
# pedigree     0.0911233031
# pregnant     0.0164721202
# pressure     0.0007055435
# triceps      0.0056942014
coef(tune2$model$learner$model, alpha=tune2$tuning_result$alpha,s=tune2$tuning_result$s)
# 9 x 1 sparse Matrix of class "dgCMatrix"
# 1
# (Intercept) -1.6359082102
# age          0.0075541841
# glucose      0.0044351365
# insulin      0.0005821515
# mass         0.0077104934
# pedigree     0.0911233031
# pregnant     0.0164721202
# pressure     0.0007055435
# triceps      0.0056942014

如何在 mlr3 中重复 glmnet 的超参数调整（alpha 和/或 lambda）

问题描述

Update 1: the progress I made更新 1：我取得的进展

Open Question开放式问题

1 个解决方案

解决方案1
1 已采纳 2021-03-21 22:22:45

如何在 mlr3 中重复 glmnet 的超参数调整（alpha 和/或 lambda）

问题描述

Update 1: the progress I made更新 1：我取得的进展

Open Question开放式问题

1 个解决方案

解决方案1 1 已采纳 2021-03-21 22:22:45

解决方案1
1 已采纳 2021-03-21 22:22:45