[英]How to impute data with mlr3 and predict with NA values?
I followed the documentation of mlr3 regarding the imputation of data with pipelines.我遵循了 mlr3 关于管道数据插补的文档。 However, the mode that I have trained does not allow predictions if a one column is NA
但是,如果一列是 NA,我训练的模式不允许预测
Do you have any idea why it doesn't work?你知道为什么它不起作用吗?
train step训练步
library(mlr3)
library(mlr3learners)
library(mlr3pipelines)
data("mtcars", package = "datasets")
data = mtcars[, 1:3]
str(data)
task_mtcars = TaskRegr$new(id="cars", backend = data, target = "mpg")
imp_missind = po("missind")
imp_num = po("imputehist", param_vals =list(affect_columns = selector_type("numeric")))
scale = po("scale")
learner = lrn('regr.ranger')
graph = po("copy", 2) %>>%
gunion(list(imp_num %>>% scale,imp_missind)) %>>%
po("featureunion") %>>%
po(learner)
graph$plot()
graphlearner = GraphLearner$new(graph)
predict step预测步骤
data = task_mtcars$data()[12:12,]
data[1:1, cyl:=NA]
predict(graphlearner, data)
The error is错误是
Error: Missing data in columns: cyl.
The example in the mlr3gallery seems to work for your case, so you basically have to switch the order of imputehist
and missind
. mlr3gallery中的示例似乎适用于您的情况,因此您基本上必须切换
imputehist
和missind
的顺序。
Another approach would be to set the missind's which
hyperparameter to "all" in order to enforce the creation of an indicator for every column.另一种方法是将missind 的
which
超参数设置为“all”,以强制为每一列创建一个指标。
This is actually a bug, where missind
returns the full task if trained on data with no missings (which in turn then overwrites the imputed values).这实际上是一个错误,如果对没有缺失的数据进行训练,
missind
返回完整的任务(然后覆盖估算值)。 Thanks a lot for spotting it.非常感谢您发现它。 I am trying to fix it here PR
我试图在这里修复它PR
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.