[英]How to eliminate “NA/NaN/Inf in foreign function call (arg 7)” running predict with randomForest
I have researched this extensively without finding a solution.我对此进行了广泛的研究,但没有找到解决方案。 I have cleaned my data set as follows:我已经清理了我的数据集如下:
library("raster")
impute.mean <- function(x) replace(x, is.na(x) | is.nan(x) | is.infinite(x) ,
mean(x, na.rm = TRUE))
losses <- apply(losses, 2, impute.mean)
colSums(is.na(losses))
isinf <- function(x) (NA <- is.infinite(x))
infout <- apply(losses, 2, is.infinite)
colSums(infout)
isnan <- function(x) (NA <- is.nan(x))
nanout <- apply(losses, 2, is.nan)
colSums(nanout)
The problem arises running the predict algorithm:运行预测算法时出现问题:
options(warn=2)
p <- predict(default.rf, losses, type="prob", inf.rm = TRUE, na.rm=TRUE, nan.rm=TRUE)
All the research says it should be NA's or Inf's or NaN's in the data but I don't find any.所有的研究都说它应该是数据中的 NA 或 Inf 或 NaN,但我没有找到。 I am making the data and the randomForest summary available for sleuthing at [deleted] Traceback doesn't reveal much (to me anyway):我正在使数据和 randomForest 摘要可用于在 [deleted] Traceback 进行侦查并没有透露太多信息(无论如何对我来说):
4: .C("classForest", mdim = as.integer(mdim), ntest = as.integer(ntest),
nclass = as.integer(object$forest$nclass), maxcat = as.integer(maxcat),
nrnodes = as.integer(nrnodes), jbt = as.integer(ntree), xts = as.double(x),
xbestsplit = as.double(object$forest$xbestsplit), pid = object$forest$pid,
cutoff = as.double(cutoff), countts = as.double(countts),
treemap = as.integer(aperm(object$forest$treemap, c(2, 1,
3))), nodestatus = as.integer(object$forest$nodestatus),
cat = as.integer(object$forest$ncat), nodepred = as.integer(object$forest$nodepred),
treepred = as.integer(treepred), jet = as.integer(numeric(ntest)),
bestvar = as.integer(object$forest$bestvar), nodexts = as.integer(nodexts),
ndbigtree = as.integer(object$forest$ndbigtree), predict.all = as.integer(predict.all),
prox = as.integer(proximity), proxmatrix = as.double(proxmatrix),
nodes = as.integer(nodes), DUP = FALSE, PACKAGE = "randomForest")
3: predict.randomForest(default.rf, losses, type = "prob", inf.rm = TRUE,
na.rm = TRUE, nan.rm = TRUE)
2: predict(default.rf, losses, type = "prob", inf.rm = TRUE, na.rm = TRUE,
nan.rm = TRUE)
1: predict(default.rf, losses, type = "prob", inf.rm = TRUE, na.rm = TRUE,
nan.rm = TRUE)
Your code is not entirely reproducible (there's no running of the actual randomForest
algorithm) but you are not replacing Inf
values with the means of column vectors.您的代码不是完全可重现的(没有运行实际的randomForest
算法),但您没有用列向量的平均值替换Inf
值。 This is because the na.rm = TRUE
argument in the call to mean()
within your impute.mean
function does exactly what it says -- removes NA
values (and not Inf
ones).这是因为在您的impute.mean
函数中调用mean()
的na.rm = TRUE
参数完全按照它所说的去做——删除NA
值(而不是Inf
值)。
You can see this, for example, by:例如,您可以通过以下方式查看:
impute.mean <- function(x) replace(x, is.na(x) | is.nan(x) | is.infinite(x), mean(x, na.rm = TRUE))
losses <- apply(losses, 2, impute.mean)
sum( apply( losses, 2, function(.) sum(is.infinite(.))) )
# [1] 696
To get rid of infinite values, use:要摆脱无限值,请使用:
impute.mean <- function(x) replace(x, is.na(x) | is.nan(x) | is.infinite(x), mean(x[!is.na(x) & !is.nan(x) & !is.infinite(x)]))
losses <- apply(losses, 2, impute.mean)
sum(apply( losses, 2, function(.) sum(is.infinite(.)) ))
# [1] 0
One cause of the error message:错误消息的原因之一:
NA/NaN/Inf in foreign function call (arg X)外部函数调用中的 NA/NaN/Inf (arg X)
When training a randomForest is having character
-class variables in your data.frame.训练 randomForest 时,data.frame 中有character
类变量。 If it comes with the warning:如果它带有警告:
NAs introduced by coercion强制引入的 NA
Check to make sure that all of your character variables have been converted to factors.检查以确保所有字符变量都已转换为因子。
Example例子
set.seed(1)
dat <- data.frame(
a = runif(100),
b = rpois(100, 10),
c = rep(c("a","b"), 100),
stringsAsFactors = FALSE
)
library(randomForest)
randomForest(a ~ ., data = dat)
Yields:产量:
Error in randomForest.default(m, y, ...) : NA/NaN/Inf in foreign function call (arg 1) In addition: Warning message: In data.matrix(x) : NAs introduced by coercion randomForest.default(m, y, ...) 中的错误:外部函数调用中的 NA/NaN/Inf (arg 1) 另外:警告消息:在 data.matrix(x) 中:由强制引入的 NA
But switch it to stringsAsFactors = TRUE
and it runs.但是将其切换为stringsAsFactors = TRUE
并运行。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.