R-随机森林预测因预测变量中的NA而失败

Question

The documentation (If I'm reading it correctly) says that the random forest predict function produces NA predictions if it encounters NA predictors for certain observations. 文档（如果我没看错的话）说，如果随机森林预测函数遇到某些观测值的NA预测变量，则会产生NA预测。

NOTE: If the object inherits from randomForest.formula, then any data with NA are silently omitted from the prediction. 注意：如果对象是从randomForest.formula继承的，则任何带有NA的数据都会从预测中自动忽略。 The returned value will contain NA correspondingly in the aggregated and individual tree predictions (if requested), but not in the proximity or node matrices 返回的值将在聚合树预测和单个树预测（如果要求）中包含相应的NA，但在邻近或节点矩阵中不包含NA

However, if I try to use the predict function on a dataset with some NA's in predictors [NA's in 7 observations out of 2688] I encounter the following error condition, and prediction fails. 但是，如果我尝试在预测变量中使用某些NA的数据集[2688中有7个观察结果中的NA]使用预测函数，则会遇到以下错误情况，并且预测失败。

Error in predict.randomForest(model, new.ds) : missing values in newdata Predict.randomForest（model，new.ds）中的错误：newdata中缺少值

There is a slightly messy work-around that I would like to avoid if possible. 如果可能的话，我想避免一些混乱的解决方法。

Am I doing/reading something wrong? 我在做/读错东西吗？ Does it have to do something with the "inherits from randomForest.formula" clause? 它是否必须对"inherits from randomForest.formula"子句进行某些处理？

Answer 1

Using some examples from the documentation: 使用文档中的一些示例：

set.seed(1)
x <- data.frame(x1=gl(32, 5), x2=runif(160), y=rnorm(160))
rf1 <- randomForest(x[-3], x[[3]], ntree=10)
> inherits(rf1,"randomForest.formula")
[1] FALSE

> iris.rf <- randomForest(Species ~ ., data=iris, importance=TRUE,
                         proximity=TRUE)
> inherits(iris.rf,"randomForest.formula")
[1] TRUE

So you probably called randomForest without using the formula interface to fit your model. 因此，您可能调用了randomForest而不使用公式接口来适合您的模型。

R-随机森林预测因预测变量中的NA而失败

问题描述

1 个解决方案

解决方案1
0 2014-02-04 17:59:45

R-随机森林预测因预测变量中的NA而失败

问题描述

1 个解决方案

解决方案1 0 2014-02-04 17:59:45

解决方案1
0 2014-02-04 17:59:45