简体   繁体   English

如何在 mlr3 中重复 glmnet 的超参数调整(alpha 和/或 lambda)

[英]how to repeat hyperparameter tuning (alpha and/or lambda) of glmnet in mlr3

I would like to repeat the hyperparameter tuning ( alpha and/or lambda ) of glmnet in mlr3 to avoid variability in smaller data sets我想在glmnetmlr3 glmnet 的超参数调整( alpha和/或lambda )以避免较小数据集的可变性

In caret , I could do this with "repeatedcv"caret中,我可以用"repeatedcv"来做到这一点

Since I really like the mlr3 family packages I would like to use them for my analysis.因为我真的很喜欢mlr3系列软件包,所以我想将它们用于我的分析。 However, I am not sure about the correct way how to do this step in mlr3但是,我不确定如何在mlr3中执行此步骤的正确方法

Example data示例数据

#library
library(caret)
library(mlr3verse)
library(mlbench)

# get example data
data(PimaIndiansDiabetes, package="mlbench")
data <- PimaIndiansDiabetes

# get small training data
train.data <- data[1:60,]

Created on 2021-03-18 by the reprex package (v1.0.0)reprex package (v1.0.0) 于 2021 年 3 月 18 日创建

caret approach (tuning alpha and lambda ) using "cv" and "repeatedcv" caret方法(调整alphalambda )使用"cv""repeatedcv"


trControlCv <- trainControl("cv",
             number = 5,
             classProbs = TRUE,
             savePredictions = TRUE,
             summaryFunction = twoClassSummary)

# use "repeatedcv" to avoid variability in smaller data sets
trControlRCv <- trainControl("repeatedcv",
             number = 5,
             repeats= 20,
             classProbs = TRUE,
             savePredictions = TRUE,
             summaryFunction = twoClassSummary)

# train and extract coefficients with "cv" and different set.seed
set.seed(2323)
model <- train(
  diabetes ~., data = train.data, method = "glmnet",
  trControl = trControlCv,
  tuneLength = 10,
  metric="ROC"
)

coef(model$finalModel, model$finalModel$lambdaOpt) -> coef1

set.seed(23)
model <- train(
  diabetes ~., data = train.data, method = "glmnet",
  trControl = trControlCv,
  tuneLength = 10,
  metric="ROC"
)

coef(model$finalModel, model$finalModel$lambdaOpt) -> coef2


# train and extract coefficients with "repeatedcv" and different set.seed
set.seed(13)

model <- train(
  diabetes ~., data = train.data, method = "glmnet",
  trControl = trControlRCv,
  tuneLength = 10,
  metric="ROC"
)

coef(model$finalModel, model$finalModel$lambdaOpt) -> coef3


set.seed(55)
model <- train(
  diabetes ~., data = train.data, method = "glmnet",
  trControl = trControlRCv,
  tuneLength = 10,
  metric="ROC"
)

coef(model$finalModel, model$finalModel$lambdaOpt) -> coef4

Created on 2021-03-18 by the reprex package (v1.0.0)reprex package (v1.0.0) 于 2021 年 3 月 18 日创建

Demonstrate different coefficients with cross-validation and same coefficients with repeated cross-validation用交叉验证展示不同的系数,用重复的交叉验证展示相同的系数

# with "cv" I get different coefficients
identical(coef1, coef2)
#> [1] FALSE

# with "repeatedcv" I get the same coefficients
identical(coef3,coef4)
#> [1] TRUE

Created on 2021-03-18 by the reprex package (v1.0.0)reprex package (v1.0.0) 于 2021 年 3 月 18 日创建

FIRST mlr3 approach using cv.glmnet (does internally tune lambda )使用cv.glmnet的第一个mlr3方法(内部调整lambda

# create elastic net regression
glmnet_lrn = lrn("classif.cv_glmnet", predict_type = "prob")

# define train task
train.task <- TaskClassif$new("train.data", train.data, target = "diabetes")

# create learner 
learner = as_learner(glmnet_lrn)

# train the learner with different set.seed
set.seed(2323)
learner$train(train.task)
coef(learner$model, s = "lambda.min") -> coef1

set.seed(23)
learner$train(train.task)
coef(learner$model, s = "lambda.min") -> coef2

Created on 2021-03-18 by the reprex package (v1.0.0)reprex package (v1.0.0) 于 2021 年 3 月 18 日创建

Demonstrate different coefficients with cross-validation通过交叉验证展示不同的系数

# compare coefficients
coef1
#> 9 x 1 sparse Matrix of class "dgCMatrix"
#>                        1
#> (Intercept) -3.323460895
#> age          0.005065928
#> glucose      0.019727881
#> insulin      .          
#> mass         .          
#> pedigree     .          
#> pregnant     0.001290570
#> pressure     .          
#> triceps      0.020529162
coef2
#> 9 x 1 sparse Matrix of class "dgCMatrix"
#>                        1
#> (Intercept) -3.146190752
#> age          0.003840963
#> glucose      0.019015433
#> insulin      .          
#> mass         .          
#> pedigree     .          
#> pregnant     .          
#> pressure     .          
#> triceps      0.018841557

Created on 2021-03-18 by the reprex package (v1.0.0)reprex package (v1.0.0) 于 2021 年 3 月 18 日创建

Update 1: the progress I made更新 1:我取得的进展

According to the comment below and this comment I could use rsmp and AutoTuner根据下面的评论和这条评论,我可以使用rsmpAutoTuner

This answer suggests not to tune cv.glmnet but glmnet (which was not available in ml3 at that time)这个答案建议不要调整cv.glmnet而是glmnet (当时在 ml3 中不可用)

SECOND mlr3 approach using glmnet (repeats the tuning of alpha and lambda )使用glmnet的第二种mlr3方法(重复调整alphalambda

# define train task
train.task <- TaskClassif$new("train.data", train.data, target = "diabetes")

# create elastic net regression
glmnet_lrn = lrn("classif.glmnet", predict_type = "prob")

# turn to learner
learner = as_learner(glmnet_lrn)

# make search space
search_space = ps(
  alpha = p_dbl(lower = 0, upper = 1),
  s = p_dbl(lower = 1, upper = 1)
)

# set terminator
terminator = trm("evals", n_evals = 20)

#set tuner
tuner = tnr("grid_search", resolution = 3)

# tune the learner
at = AutoTuner$new(
  learner = learner,
  rsmp("repeated_cv"),
  measure = msr("classif.ce"),
  search_space = search_space,
  terminator = terminator,
  tuner=tuner)

at
#> <AutoTuner:classif.glmnet.tuned>
#> * Model: -
#> * Parameters: list()
#> * Packages: glmnet
#> * Predict Type: prob
#> * Feature types: logical, integer, numeric
#> * Properties: multiclass, twoclass, weights

Open Question开放式问题

How can I demonstrate that my second approach is valid and that I get same or similar coefficients with different seeds?我如何证明我的第二种方法是有效的,并且我得到不同种子的相同或相似系数? ie. IE。 how can I extract the coefficients for the final model of the AutoTuner如何提取AutoTuner的最终 model 的系数

set.seed(23)
at$train(train.task) -> tune1

set.seed(2323) 
at$train(train.task) -> tune2

Created on 2021-03-18 by the reprex package (v1.0.0)reprex package (v1.0.0) 于 2021 年 3 月 18 日创建

Repeated hyperparameter tuning (alpha and lambda) of glmnet can be done using the SECOND mlr3 approach as stated above. glmnet 的重复超参数调整(alpha 和 lambda)可以使用SECOND mlr3方法glmnet ,如上所述。 The coefficients can be extracted with stats::coef and the stored values in the AutoTuner可以使用stats::coef和 A AutoTuner中存储的值提取系数

coef(tune1$model$learner$model, alpha=tune1$tuning_result$alpha,s=tune1$tuning_result$s)
# 9 x 1 sparse Matrix of class "dgCMatrix"
# 1
# (Intercept) -1.6359082102
# age          0.0075541841
# glucose      0.0044351365
# insulin      0.0005821515
# mass         0.0077104934
# pedigree     0.0911233031
# pregnant     0.0164721202
# pressure     0.0007055435
# triceps      0.0056942014
coef(tune2$model$learner$model, alpha=tune2$tuning_result$alpha,s=tune2$tuning_result$s)
# 9 x 1 sparse Matrix of class "dgCMatrix"
# 1
# (Intercept) -1.6359082102
# age          0.0075541841
# glucose      0.0044351365
# insulin      0.0005821515
# mass         0.0077104934
# pedigree     0.0911233031
# pregnant     0.0164721202
# pressure     0.0007055435
# triceps      0.0056942014

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM