[英]how to repeat hyperparameter tuning (alpha and/or lambda) of glmnet in mlr3
I would like to repeat the hyperparameter tuning ( alpha
and/or lambda
) of glmnet
in mlr3
to avoid variability in smaller data sets我想在
glmnet
中mlr3
glmnet 的超参数调整( alpha
和/或lambda
)以避免较小数据集的可变性
In caret
, I could do this with "repeatedcv"
在
caret
中,我可以用"repeatedcv"
来做到这一点
Since I really like the mlr3
family packages I would like to use them for my analysis.因为我真的很喜欢
mlr3
系列软件包,所以我想将它们用于我的分析。 However, I am not sure about the correct way how to do this step in mlr3
但是,我不确定如何在
mlr3
中执行此步骤的正确方法
Example data示例数据
#library
library(caret)
library(mlr3verse)
library(mlbench)
# get example data
data(PimaIndiansDiabetes, package="mlbench")
data <- PimaIndiansDiabetes
# get small training data
train.data <- data[1:60,]
Created on 2021-03-18 by the reprex package (v1.0.0)由reprex package (v1.0.0) 于 2021 年 3 月 18 日创建
caret
approach (tuning alpha
and lambda
) using "cv"
and "repeatedcv"
caret
方法(调整alpha
和lambda
)使用"cv"
和"repeatedcv"
trControlCv <- trainControl("cv",
number = 5,
classProbs = TRUE,
savePredictions = TRUE,
summaryFunction = twoClassSummary)
# use "repeatedcv" to avoid variability in smaller data sets
trControlRCv <- trainControl("repeatedcv",
number = 5,
repeats= 20,
classProbs = TRUE,
savePredictions = TRUE,
summaryFunction = twoClassSummary)
# train and extract coefficients with "cv" and different set.seed
set.seed(2323)
model <- train(
diabetes ~., data = train.data, method = "glmnet",
trControl = trControlCv,
tuneLength = 10,
metric="ROC"
)
coef(model$finalModel, model$finalModel$lambdaOpt) -> coef1
set.seed(23)
model <- train(
diabetes ~., data = train.data, method = "glmnet",
trControl = trControlCv,
tuneLength = 10,
metric="ROC"
)
coef(model$finalModel, model$finalModel$lambdaOpt) -> coef2
# train and extract coefficients with "repeatedcv" and different set.seed
set.seed(13)
model <- train(
diabetes ~., data = train.data, method = "glmnet",
trControl = trControlRCv,
tuneLength = 10,
metric="ROC"
)
coef(model$finalModel, model$finalModel$lambdaOpt) -> coef3
set.seed(55)
model <- train(
diabetes ~., data = train.data, method = "glmnet",
trControl = trControlRCv,
tuneLength = 10,
metric="ROC"
)
coef(model$finalModel, model$finalModel$lambdaOpt) -> coef4
Created on 2021-03-18 by the reprex package (v1.0.0)由reprex package (v1.0.0) 于 2021 年 3 月 18 日创建
Demonstrate different coefficients with cross-validation and same coefficients with repeated cross-validation用交叉验证展示不同的系数,用重复的交叉验证展示相同的系数
# with "cv" I get different coefficients
identical(coef1, coef2)
#> [1] FALSE
# with "repeatedcv" I get the same coefficients
identical(coef3,coef4)
#> [1] TRUE
Created on 2021-03-18 by the reprex package (v1.0.0)由reprex package (v1.0.0) 于 2021 年 3 月 18 日创建
FIRST mlr3
approach using cv.glmnet
(does internally tune lambda
)使用
cv.glmnet
的第一个mlr3
方法(内部调整lambda
)
# create elastic net regression
glmnet_lrn = lrn("classif.cv_glmnet", predict_type = "prob")
# define train task
train.task <- TaskClassif$new("train.data", train.data, target = "diabetes")
# create learner
learner = as_learner(glmnet_lrn)
# train the learner with different set.seed
set.seed(2323)
learner$train(train.task)
coef(learner$model, s = "lambda.min") -> coef1
set.seed(23)
learner$train(train.task)
coef(learner$model, s = "lambda.min") -> coef2
Created on 2021-03-18 by the reprex package (v1.0.0)由reprex package (v1.0.0) 于 2021 年 3 月 18 日创建
Demonstrate different coefficients with cross-validation通过交叉验证展示不同的系数
# compare coefficients
coef1
#> 9 x 1 sparse Matrix of class "dgCMatrix"
#> 1
#> (Intercept) -3.323460895
#> age 0.005065928
#> glucose 0.019727881
#> insulin .
#> mass .
#> pedigree .
#> pregnant 0.001290570
#> pressure .
#> triceps 0.020529162
coef2
#> 9 x 1 sparse Matrix of class "dgCMatrix"
#> 1
#> (Intercept) -3.146190752
#> age 0.003840963
#> glucose 0.019015433
#> insulin .
#> mass .
#> pedigree .
#> pregnant .
#> pressure .
#> triceps 0.018841557
Created on 2021-03-18 by the reprex package (v1.0.0)由reprex package (v1.0.0) 于 2021 年 3 月 18 日创建
According to the comment below and this comment I could use rsmp
and AutoTuner
根据下面的评论和这条评论,我可以使用
rsmp
和AutoTuner
This answer suggests not to tune cv.glmnet
but glmnet
(which was not available in ml3 at that time)这个答案建议不要调整
cv.glmnet
而是glmnet
(当时在 ml3 中不可用)
SECOND mlr3
approach using glmnet
(repeats the tuning of alpha
and lambda
)使用
glmnet
的第二种mlr3
方法(重复调整alpha
和lambda
)
# define train task
train.task <- TaskClassif$new("train.data", train.data, target = "diabetes")
# create elastic net regression
glmnet_lrn = lrn("classif.glmnet", predict_type = "prob")
# turn to learner
learner = as_learner(glmnet_lrn)
# make search space
search_space = ps(
alpha = p_dbl(lower = 0, upper = 1),
s = p_dbl(lower = 1, upper = 1)
)
# set terminator
terminator = trm("evals", n_evals = 20)
#set tuner
tuner = tnr("grid_search", resolution = 3)
# tune the learner
at = AutoTuner$new(
learner = learner,
rsmp("repeated_cv"),
measure = msr("classif.ce"),
search_space = search_space,
terminator = terminator,
tuner=tuner)
at
#> <AutoTuner:classif.glmnet.tuned>
#> * Model: -
#> * Parameters: list()
#> * Packages: glmnet
#> * Predict Type: prob
#> * Feature types: logical, integer, numeric
#> * Properties: multiclass, twoclass, weights
How can I demonstrate that my second approach is valid and that I get same or similar coefficients with different seeds?我如何证明我的第二种方法是有效的,并且我得到不同种子的相同或相似系数? ie.
IE。 how can I extract the coefficients for the final model of the
AutoTuner
如何提取
AutoTuner
的最终 model 的系数
set.seed(23)
at$train(train.task) -> tune1
set.seed(2323)
at$train(train.task) -> tune2
Created on 2021-03-18 by the reprex package (v1.0.0)由reprex package (v1.0.0) 于 2021 年 3 月 18 日创建
Repeated hyperparameter tuning (alpha and lambda) of glmnet
can be done using the SECOND mlr3
approach as stated above. glmnet 的重复超参数调整(alpha 和 lambda)可以使用SECOND
mlr3
方法glmnet
,如上所述。 The coefficients can be extracted with stats::coef
and the stored values in the AutoTuner
可以使用
stats::coef
和 A AutoTuner
中存储的值提取系数
coef(tune1$model$learner$model, alpha=tune1$tuning_result$alpha,s=tune1$tuning_result$s)
# 9 x 1 sparse Matrix of class "dgCMatrix"
# 1
# (Intercept) -1.6359082102
# age 0.0075541841
# glucose 0.0044351365
# insulin 0.0005821515
# mass 0.0077104934
# pedigree 0.0911233031
# pregnant 0.0164721202
# pressure 0.0007055435
# triceps 0.0056942014
coef(tune2$model$learner$model, alpha=tune2$tuning_result$alpha,s=tune2$tuning_result$s)
# 9 x 1 sparse Matrix of class "dgCMatrix"
# 1
# (Intercept) -1.6359082102
# age 0.0075541841
# glucose 0.0044351365
# insulin 0.0005821515
# mass 0.0077104934
# pedigree 0.0911233031
# pregnant 0.0164721202
# pressure 0.0007055435
# triceps 0.0056942014
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.