在 MLR3 中将 rpart 超调整参数与下采样相结合

[英]Combining rpart hyper tuning parameters with down sampling in MLR3

I am walking through great examples from the MLR3 package ( mlr3gallery:imbalanced data examples ), and I was hoping to see an example that combines hyper parameter tuning AND an imbalance correction.我正在浏览 MLR3 package ( mlr3gallery:imbalanced data examples ) 的优秀示例,我希望看到一个结合了超参数调整和不平衡校正的示例。

From the link above, as description of what I am trying to achieve:从上面的链接中,作为我想要实现的目标的描述:

To keep runtime low, we define the search space only for the imbalacy correction method.为了保持低运行时间,我们只为不平衡校正方法定义搜索空间。 However, one can also jointly tune the hyperparameter of the learner along with the imbalance correction method by extending the search space with the learner's hyperparameters.但是,也可以通过使用学习器的超参数扩展搜索空间来联合调整学习器的超参数和不平衡校正方法。

Here is an example that comes close - mlr3 PipeOps: Create branches with different data transformations and benchmark different learners within and between branches这是一个接近的示例 - mlr3 PipeOps:创建具有不同数据转换的分支,并对分支内和分支之间的不同学习器进行基准测试

So we can (mis)use missuse's great example from this as a walkthough:所以我们可以(错误地)使用 misuse 的这个很好的例子作为一个演练:


#set up an rpart learner
learner <- lrn("classif.rpart", predict_type = "prob")
learner$param_set$values <- list(
  cp = 0,
  maxdepth = 21,
  minbucket = 12,
  minsplit = 24

#Create the tree graphs:

# graph 1, just imputehist
graph_nop <- po("imputehist") %>>%

# graph 2 : imputehist and undersample majority class (ratio relative to majority class)

graph_down <- po("imputehist") %>>%
  po("classbalancing", id = "undersample", adjust = "major", 
     reference = "major", shuffle = FALSE, ratio = 1/2) %>>%

# graph 3: impute hist and oversample minority class (ratio relative to minority class)

graph_up <- po("imputehist") %>>%
  po("classbalancing", id = "oversample", adjust = "minor", 
     reference = "minor", shuffle = FALSE, ratio = 2) %>>%

#Convert graphs to learners and set predict_type

graph_nop <-  GraphLearner$new(graph_nop)
graph_nop$predict_type <- "prob"

graph_down <- GraphLearner$new(graph_down)
graph_down$predict_type <- "prob"

graph_up <- GraphLearner$new(graph_up)
graph_up$predict_type <- "prob"

#define re-sampling and instantiate it so always the same split will be used:

hld <- rsmp("holdout")



bmr <- benchmark(design = benchmark_grid(task = tsk("sonar"),
                                         learner = list(graph_nop,
                 store_models = TRUE) #only needed if you want to inspect the models

#check result using different measures:


#This can be also performed within one pipeline with branching but one would need to define the paramset and use a tuner:

  graph2 <- 
  po("imputehist") %>>%
  po("branch", c("nop", "classbalancing_up", "classbalancing_down")) %>>%
    po("nop", id = "nop"),
    po("classbalancing", id = "classbalancing_up", ratio = 2, reference = 'major'),
    po("classbalancing", id = "classbalancing_down", ratio = 2, reference = 'minor') 
  )) %>>%
  po("unbranch") %>>%


#Note that the unbranch happens before the learner since one (always the same) learner is being used. Convert graph to learner and set predict_type

graph2 <- GraphLearner$new(graph2)
graph2$predict_type <- "prob"

#Define the param set. In this case just the different branch options.

ps <- ParamSet$new(
    ParamFct$new("branch.selection", levels = c("nop", "classbalancing_up", "classbalancing_down")),

#In general you would want to add also learner hyper parameters like cp and minsplit for rpart as well as the ratio of over/undersampling.

So how do we add the learner hyper parameters like cp and minsplit at this point?那么此时我们如何添加像cp和minsplit这样的学习器超参数呢?

#perhaps by adding them to the param list?
ps = ParamSet$new(list(
  ParamFct$new("branch.selection", levels = c("nop", "classbalancing_up", "classbalancing_down")),
  ParamDbl$new("cp", lower = 0.001, upper = 0.1),
  ParamInt$new("minsplit", lower = 1, upper = 10)

#Create a tuning instance and grid search with resolution 1 since no other parameters are tuned. The tuner will iterate through different pipeline branches as defined in the paramset.

instance <- TuningInstance$new(
  task = tsk("sonar"),
  learner = graph2,
  resampling = hld,
  measures = msr("classif.auc"),
  param_set = ps,
  terminator = term("none")

tuner <- tnr("grid_search", resolution = 1)

But this results in:但这会导致:

Error in (function (xs)  : 
  Assertion on 'xs' failed: Parameter 'cp' not available..

I feel I may be missing a branch layer on how to combine these two things (the rpart hyper parameters / minsplit and cp; and the down/up sampling)?我觉得我可能缺少一个关于如何结合这两件事的分支层(rpart 超参数/minsplit 和 cp;以及下/上采样)? Thank you for any assistance.感谢您提供任何帮助。

As soon as you construct a piped learner the IDs of the underlaying params change, as they are added a prefix.一旦你构建了一个管道学习器,底层参数的 ID 就会改变,因为它们被添加了一个前缀。 You can always check the param_set of the learner.您可以随时检查学习者的param_set In your example it is graph2$param_set .在您的示例中,它是graph2$param_set There you will see that the params you are looking for are the following:在那里你会看到你正在寻找的参数如下:

ps = ParamSet$new(list(
  ParamFct$new("branch.selection", levels = c("nop", "classbalancing_up", "classbalancing_down")),
  ParamDbl$new("classif.rpart.cp", lower = 0.001, upper = 0.1),
  ParamInt$new("classif.rpart.minsplit", lower = 1, upper = 10)

