[英]RandomForest in R reports missing values in object, but vector has zero NAs in it
I'm trying to use the randomForest package in R, but I've encountered a problem where R tells me that there is missing data in the response vector. 我试图在R中使用randomForest包,但是遇到一个问题,其中R告诉我响应向量中缺少数据。
> rf_blackcomb_earlyGame <- randomForest(max_cohort ~ ., data=blackcomb_earlyGame[-c(1,2), ])
Error in na.fail.default(list(max_cohort = c(47, 25, 20, 37, 1, 0, 23, :
missing values in object
The specified error is clear enough. 指定的错误很明显。 I've encountered it before and in the past there actually have been missing data, but this time there aren't any missing data.
我以前遇到过它,过去确实缺少数据,但是这次没有任何丢失的数据。
> class(blackcomb_earlyGame$max_cohort)
[1] "numeric"
> which(is.na(blackcomb_earlyGame$max_cohort))
integer(0)
I've tried using na.roughfix to see if that will help, but I get the following error. 我尝试使用na.roughfix来查看是否有帮助,但是出现以下错误。
Error in na.roughfix.data.frame(list(max_cohort = c(47, 25, 20, 37, 1, :
na.roughfix only works for numeric or factor
I've checked every vector to make sure that none of them contain any NAs, and none of them do. 我检查了每个向量,以确保它们都不包含任何NA,并且它们都不包含。
Does anyone have any suggestions? 有没有人有什么建议?
randomForest
can fail due to a few different types of issues with the data. 由于数据的几种不同类型的问题,
randomForest
可能会失败。 Missing values ( NA
), values of NaN
, Inf
or -Inf
, and character types that have not been cast into factors will all fail, with a variety of error messages. 缺失值(
NA
)的数值NaN
, Inf
或-Inf
没有被投进去的因素,和性格类型将全部失败,与各种错误消息。
We can see below some examples of the error messages generated by each of these issues: 我们可以在下面看到一些由这些问题产生的错误消息的示例:
my.df <- data.frame(a = 1:26, b=letters, c=(1:26)+rnorm(26))
rf <- randomForest(a ~ ., data=my.df)
# this works without issues, because b=letters is cast into a factor variable by default
my.df$d <- LETTERS # Now we add a character column
rf <- randomForest(a ~ ., data=my.df)
# Error in randomForest.default(m, y, ...) :
# NA/NaN/Inf in foreign function call (arg 1)
# In addition: Warning message:
# In data.matrix(x) : NAs introduced by coercion
rf <- randomForest(d ~ ., data=my.df)
# Error in y - ymean : non-numeric argument to binary operator
# In addition: Warning message:
# In mean.default(y) : argument is not numeric or logical: returning NA
my.df$d <- c(NA, rnorm(25))
rf <- randomForest(a ~ ., data=my.df)
rf <- randomForest(d ~ ., data=my.df)
# Error in na.fail.default(list(a = 1:26, b = 1:26, c = c(3.14586293058335, :
# missing values in object
my.df$d <- c(Inf, rnorm(25))
rf <- randomForest(a ~ ., data=my.df)
rf <- randomForest(d ~ ., data=my.df)
# Error in randomForest.default(m, y, ...) :
# NA/NaN/Inf in foreign function call (arg 1)
Interestingly, the error message you received, which was caused by having a character
type in the data frame (see comments ), is the error that I see when there is a numeric column with NA
. 有趣的是,您收到的错误消息是由于在数据框中具有
character
类型而引起的(请参见注释 ),这是我在存在带有NA
的数字列时看到的错误。 This suggests that there may either be (1) differences in the errors from different versions of randomForest
or (2) that the error message depends in more complex ways on the structure of the data. 这表明,要么(1)来自
randomForest
不同版本的错误有所不同,要么(2)错误消息以更复杂的方式取决于数据的结构。 Either way, the advice for anyone receiving errors such as these is to look for all of the possible issues with the data listed above, in order to track down the cause. 无论哪种方式,对于任何接收到此类错误的人,建议都是使用上面列出的数据查找所有可能的问题,以便找出原因。
Perhaps there are Inf
or -Inf
values? 也许有
Inf
或-Inf
值?
is.na(c(1, NA, Inf, NaN, -Inf))
#[1] FALSE TRUE FALSE TRUE FALSE
is.finite(c(1, NA, Inf, NaN, -Inf))
#[1] TRUE FALSE FALSE FALSE FALSE
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.