简体   繁体   English

鼠标r包中执行随机森林时出错

[英]error in implementation of random forest in mice r package

Here is just example data: 这只是示例数据:

# generation of correlated data   
matrixCR <- matrix(NA, nrow = 100, ncol = 100)
diag(matrixCR) <- 1
matrixCR[upper.tri (matrixCR, diag = FALSE)] <- 0.5
matrixCR[lower.tri (matrixCR, diag = FALSE)] <- 0.5
matrixCR[1:10,1:10]
L = chol(matrixCR)# Cholesky decomposition
nvars = dim(L)[1]
nobs = 200
set.seed(123)
rM = t(L) %*% matrix(rnorm(nvars*nobs), nrow=nvars, ncol=nobs)
rM1 <- t(rM)
rownames(rM1) <- paste("S", 1:200, sep = "") 
colnames(rM1) <- paste("M", 1:100, sep = "")
# introducing missing value to the dataset 
N <- 2000*0.05 # 5% random missing values 
inds <- round ( runif(N, 1, length(rM1)) )
rM1[inds] <- NA


# using random forest implemented in mice package 
require(mice)
out.imp <- mice(rM1, m = 5, method ="rf")
imp.data <- complete(out.imp)

I am getting following error: 我收到以下错误:

 iter imp variable
  1   1  M1  M2Error in apply(forest, MARGIN = 1, FUN = function(s) sample(unlist(s),  : 
  dim(X) must have a positive length

I am not sure what is causing this problem ? 我不确定是什么引起了这个问题?

As I mentioned in my comment, when the method is set to randomforest ( rf ), the mice function is throwing an error whenever it gets to a column with only a single NA value, but runs fine with any other number of NA values. 正如我在评论中提到的那样,当该method设置为randomforest( rf )时,只要到达只有一个NA值的列,而使用任何其他数量的NA值, mice函数就会抛出错误。

I checked with the package author and this appears to be a bug. 我检查了软件包的作者,这似乎是一个错误。 Until it's fixed, you can choose a different imputation method for those columns with a single NA value. 在修复之前,您可以为具有单个NA值的列选择其他插补方法。 For example: 例如:

# Count number of NA in each column
NAcount = apply(rM1, 2, function(x) sum(is.na(x)))

# Create a vector giving the imputation method to use for each column. 
# Set it to "rf" unless that column has exactly one NA value.
method = rep("rf", ncol(rM1))
method[which(NAcount==1)] = "norm"

# Run the imputation with the new "method" selections
out.imp <- mice(rM1, m = 5, method = method)

I realize that for consistency you may want to use the same imputation method for all the columns, but the above gives you an option if you're set on using the randomforest method. 我意识到,为了保持一致性,您可能希望对所有列使用相同的插补方法,但是如果您设置为使用randomforest方法,则以上内容为您提供了一个选择。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM