简体   繁体   English

mlr3 - 如何使用 `mlr3` 接口删除不完整的观察结果

[英]mlr3 - how to remove incomplete observations using `mlr3` interface

Is it possible to remove incomplete observation within a task --- task <- TaskRegr$new("data", data, "y") --- using mlr3 filters or pipeops?是否可以使用mlr3过滤器或 pipeops 删除任务中的不完整观察 --- task <- TaskRegr$new("data", data, "y") ---

I don't think there is a preprocessing operator for removing observations.我认为没有用于删除观察结果的预处理运算符。

What I would do is to use filter method within a Task.我要做的是在任务中使用filter方法。

Example:例子:

t = tsk("pima")
ids = complete.cases(t$data())

# number of incomplete observations
sum(!ids)

t$filter(which(ids))

# number of incomplete observations
# should be zero now
ids = complete.cases(t$data())
sum(!ids)

complete.cases gives a Boolean vector that indicates which rows contain complete observations (no NA's). complete.cases给出一个 Boolean 向量,指示哪些行包含完整的观察结果(没有 NA)。 filter subsets task's data by row ids provided in the parameter.通过参数中提供的行 ID filter子集任务的数据。 Row ids not given in the parameter are removed in-place.参数中未给出的行 ID 将被就地删除。

If you want to instead impute incomplete observations, there are a few imputation operators like PipeOpImputeConstant that impute features by a constant.如果您想代之以估算不完整的观察结果,可以使用一些估算运算符,例如 PipeOpImputeConstant,它们通过常数估算特征。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM