[英]error in implementation of random forest in mice r package
Here is just example data: 这只是示例数据:
# generation of correlated data
matrixCR <- matrix(NA, nrow = 100, ncol = 100)
diag(matrixCR) <- 1
matrixCR[upper.tri (matrixCR, diag = FALSE)] <- 0.5
matrixCR[lower.tri (matrixCR, diag = FALSE)] <- 0.5
matrixCR[1:10,1:10]
L = chol(matrixCR)# Cholesky decomposition
nvars = dim(L)[1]
nobs = 200
set.seed(123)
rM = t(L) %*% matrix(rnorm(nvars*nobs), nrow=nvars, ncol=nobs)
rM1 <- t(rM)
rownames(rM1) <- paste("S", 1:200, sep = "")
colnames(rM1) <- paste("M", 1:100, sep = "")
# introducing missing value to the dataset
N <- 2000*0.05 # 5% random missing values
inds <- round ( runif(N, 1, length(rM1)) )
rM1[inds] <- NA
# using random forest implemented in mice package
require(mice)
out.imp <- mice(rM1, m = 5, method ="rf")
imp.data <- complete(out.imp)
I am getting following error: 我收到以下错误:
iter imp variable
1 1 M1 M2Error in apply(forest, MARGIN = 1, FUN = function(s) sample(unlist(s), :
dim(X) must have a positive length
I am not sure what is causing this problem ? 我不确定是什么引起了这个问题?
As I mentioned in my comment, when the method
is set to randomforest ( rf
), the mice
function is throwing an error whenever it gets to a column with only a single NA
value, but runs fine with any other number of NA
values. 正如我在评论中提到的那样,当该
method
设置为randomforest( rf
)时,只要到达只有一个NA
值的列,而使用任何其他数量的NA
值, mice
函数就会抛出错误。
I checked with the package author and this appears to be a bug. 我检查了软件包的作者,这似乎是一个错误。 Until it's fixed, you can choose a different imputation method for those columns with a single
NA
value. 在修复之前,您可以为具有单个
NA
值的列选择其他插补方法。 For example: 例如:
# Count number of NA in each column
NAcount = apply(rM1, 2, function(x) sum(is.na(x)))
# Create a vector giving the imputation method to use for each column.
# Set it to "rf" unless that column has exactly one NA value.
method = rep("rf", ncol(rM1))
method[which(NAcount==1)] = "norm"
# Run the imputation with the new "method" selections
out.imp <- mice(rM1, m = 5, method = method)
I realize that for consistency you may want to use the same imputation method for all the columns, but the above gives you an option if you're set on using the randomforest method. 我意识到,为了保持一致性,您可能希望对所有列使用相同的插补方法,但是如果您设置为使用randomforest方法,则以上内容为您提供了一个选择。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.