简体   繁体   English

插补期间使用随机森林(MICE程序包)时出错

[英]Error using random forest (MICE package) during imputation

I would like to use the method Random Forest to impute missing values. 我想使用随机森林方法来估算缺失值。 I have read some papers that claim that MICE random Forest perform better than parametric mice. 我读过一些论文,声称MICE随机森林比参数小鼠表现更好。

In my case, I already run a model for the default mice and got the results and played with them. 就我而言,我已经为默认鼠标运行了一个模型,并得到了结果并对其进行处理。 However when I had a option for the method random forest, I got an error and I'm not sure why. 但是,当我有方法随机森林的选项时,出现错误,并且不确定为什么。 I've seen some questions relating to errors with random forest and mice but those are not my cases. 我已经看到了一些与随机森林和小鼠错误有关的问题,但这些不是我的情况。 My variables have more than a single NA. 我的变量具有多个NA。

imp <- mice(data1, m=70, pred=quickpred(data1), method="pmm", seed=71152, printFlag=TRUE)
impRF <- mice(data1, m=70, pred=quickpred(data1), method="rf", seed=71152, printFlag=TRUE)

iter imp variable
 1   1  Vac
Error in if (n == 0) stop("data (x) has 0 rows") : argument is of length zero

Any one has any idea why I'm getting this error? 有人知道为什么我会收到此错误吗?

EDIT 编辑

I tried to change all variables to numeric instead of having dummy variables and it returned the same error and some warnings() 我试图将所有变量更改为数字,而不是使用虚拟变量,它返回了相同的错误和一些warning()

impRF <- mice(data, m=70, pred=quickpred(data), method="rf", seed=71152, printFlag=TRUE)

 iter imp variable
   1   1  Vac  CliForm
 Error in if (n == 0) stop("data (x) has 0 rows") : argument is of length zero
 In addition: There were 50 or more warnings (use warnings() to see the first 50)

 50: In randomForest.default(x = xobs, y = yobs, ntree = 1,  ... :
   The response has five or fewer unique values.  Are you sure you want to do regression?

EDIT1 EDIT1

I've tried only with 5 imputations and a smaller subset of the data, with only 2000 rows and got a few different errors: 我只尝试了5种插补和较小的数据子集,仅使用2000行,并遇到了一些不同的错误:

> imp <- mice(data2, m=5, pred=quickpred(data2), method="rf", seed=71152, printFlag=TRUE)

iter imp variable
 1   1  Vac  Radio  Origin  Job  Alc  Smk  Drugs  Prison  Commu  Hmless  Symp
Error in randomForest.default(x = xobs, y = yobs, ntree = 1, ...) : NAs in foreign   
 function call (arg 11)
 In addition: Warning messages:
 1: In randomForest.default(x = xobs, y = yobs, ntree = 1, ...) : invalid mtry: reset to within valid range
 2: In max(ncat) : no non-missing arguments to max; returning -Inf
 3: In randomForest.default(x = xobs, y = yobs, ntree = 1, ...) : NAs introduced by coercion

I also encountered this error when I had only one fully observed variable, which I'm guessing is the cause in your case too. 当我只有一个完全观察到的变量时,我也遇到了此错误,我想这也是您遇到的原因。 My colleague Anoop Shah provided me with a fix (below) and Prof van Buuren (mice's author) has said he will include it in the next update of the package. 我的同事Anoop Shah为我提供了一个修复程序(如下),而van Buuren教授(小鼠的作者)表示,他将在软件包的下一个更新中包括此修复程序。

In R type the following to enable you to redefine the rf impute function. 在R中,键入以下内容使您可以重新定义rf归因功能。 fixInNamespace("mice.impute.rf", "mice") fixInNamespace(“ mice.impute.rf”,“鼠标”)

The corrected function to paste in is then: 然后粘贴的更正函数是:

mice.impute.rf <- function (y, ry, x, ntree = 100, ...){
ntree <- max(1, ntree)
xobs <- as.matrix(x[ry, ])
xmis <- as.matrix(x[!ry, ])
yobs <- y[ry]
onetree <- function(xobs, xmis, yobs, ...) {
    fit <- randomForest(x = xobs, y = yobs, ntree = 1, ...)
    leafnr <- predict(object = fit, newdata = xobs, nodes = TRUE)
    nodes <- predict(object = fit, newdata = xmis, nodes = TRUE)
    donor <- lapply(nodes, function(s) yobs[leafnr == s])
    return(donor)
}
forest <- sapply(1:ntree, FUN = function(s) onetree(xobs, 
    xmis, yobs, ...))
impute <- apply(forest, MARGIN = 1, FUN = function(s) sample(unlist(s), 
    1))
return(impute)
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM