简体   繁体   English

R-随机森林预测因预测变量中的NA而失败

[英]R- Random forest predict fails with NAs in predictors

The documentation (If I'm reading it correctly) says that the random forest predict function produces NA predictions if it encounters NA predictors for certain observations. 文档(如果我没看错的话)说,如果随机森林预测函数遇到某些观测值的NA预测变量,则会产生NA预测。

NOTE: If the object inherits from randomForest.formula, then any data with NA are silently omitted from the prediction. 注意:如果对象是从randomForest.formula继承的,则任何带有NA的数据都会从预测中自动忽略。 The returned value will contain NA correspondingly in the aggregated and individual tree predictions (if requested), but not in the proximity or node matrices 返回的值将在聚合树预测和单个树预测(如果要求)中包含相应的NA,但在邻近或节点矩阵中不包含NA

However, if I try to use the predict function on a dataset with some NA's in predictors [NA's in 7 observations out of 2688] I encounter the following error condition, and prediction fails. 但是,如果我尝试在预测变量中使用某些NA的数据集[2688中有7个观察结果中的NA]使用预测函数,则会遇到以下错误情况,并且预测失败。

Error in predict.randomForest(model, new.ds) : missing values in newdata Predict.randomForest(model,new.ds)中的错误:newdata中缺少值

There is a slightly messy work-around that I would like to avoid if possible. 如果可能的话,我想避免一些混乱的解决方法。

Am I doing/reading something wrong? 我在做/读错东西吗? Does it have to do something with the "inherits from randomForest.formula" clause? 它是否必须对"inherits from randomForest.formula"子句进行某些处理?

Using some examples from the documentation: 使用文档中的一些示例:

set.seed(1)
x <- data.frame(x1=gl(32, 5), x2=runif(160), y=rnorm(160))
rf1 <- randomForest(x[-3], x[[3]], ntree=10)
> inherits(rf1,"randomForest.formula")
[1] FALSE

> iris.rf <- randomForest(Species ~ ., data=iris, importance=TRUE,
                         proximity=TRUE)
> inherits(iris.rf,"randomForest.formula")
[1] TRUE

So you probably called randomForest without using the formula interface to fit your model. 因此,您可能调用了randomForest而不使用公式接口来适合您的模型。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM