插補期間使用隨機森林（MICE程序包）時出錯

Question

我想使用隨機森林方法來估算缺失值。 我讀過一些論文，聲稱MICE隨機森林比參數小鼠表現更好。

就我而言，我已經為默認鼠標運行了一個模型，並得到了結果並對其進行處理。 但是，當我有方法隨機森林的選項時，出現錯誤，並且不確定為什么。 我已經看到了一些與隨機森林和小鼠錯誤有關的問題，但這些不是我的情況。 我的變量具有多個NA。

imp <- mice(data1, m=70, pred=quickpred(data1), method="pmm", seed=71152, printFlag=TRUE)
impRF <- mice(data1, m=70, pred=quickpred(data1), method="rf", seed=71152, printFlag=TRUE)

iter imp variable
 1   1  Vac
Error in if (n == 0) stop("data (x) has 0 rows") : argument is of length zero

有人知道為什么我會收到此錯誤嗎？

編輯

我試圖將所有變量更改為數字，而不是使用虛擬變量，它返回了相同的錯誤和一些warning（）

impRF <- mice(data, m=70, pred=quickpred(data), method="rf", seed=71152, printFlag=TRUE)

 iter imp variable
   1   1  Vac  CliForm
 Error in if (n == 0) stop("data (x) has 0 rows") : argument is of length zero
 In addition: There were 50 or more warnings (use warnings() to see the first 50)

 50: In randomForest.default(x = xobs, y = yobs, ntree = 1,  ... :
   The response has five or fewer unique values.  Are you sure you want to do regression?

EDIT1

我只嘗試了5種插補和較小的數據子集，僅使用2000行，並遇到了一些不同的錯誤：

> imp <- mice(data2, m=5, pred=quickpred(data2), method="rf", seed=71152, printFlag=TRUE)

iter imp variable
 1   1  Vac  Radio  Origin  Job  Alc  Smk  Drugs  Prison  Commu  Hmless  Symp
Error in randomForest.default(x = xobs, y = yobs, ntree = 1, ...) : NAs in foreign   
 function call (arg 11)
 In addition: Warning messages:
 1: In randomForest.default(x = xobs, y = yobs, ntree = 1, ...) : invalid mtry: reset to within valid range
 2: In max(ncat) : no non-missing arguments to max; returning -Inf
 3: In randomForest.default(x = xobs, y = yobs, ntree = 1, ...) : NAs introduced by coercion

Answer 1

當我只有一個完全觀察到的變量時，我也遇到了此錯誤，我想這也是您遇到的原因。 我的同事Anoop Shah為我提供了一個修復程序（如下），而van Buuren教授（小鼠的作者）表示，他將在軟件包的下一個更新中包括此修復程序。

在R中，鍵入以下內容使您可以重新定義rf歸因功能。 fixInNamespace（“ mice.impute.rf”，“鼠標”）

然后粘貼的更正函數是：

mice.impute.rf <- function (y, ry, x, ntree = 100, ...){
ntree <- max(1, ntree)
xobs <- as.matrix(x[ry, ])
xmis <- as.matrix(x[!ry, ])
yobs <- y[ry]
onetree <- function(xobs, xmis, yobs, ...) {
    fit <- randomForest(x = xobs, y = yobs, ntree = 1, ...)
    leafnr <- predict(object = fit, newdata = xobs, nodes = TRUE)
    nodes <- predict(object = fit, newdata = xmis, nodes = TRUE)
    donor <- lapply(nodes, function(s) yobs[leafnr == s])
    return(donor)
}
forest <- sapply(1:ntree, FUN = function(s) onetree(xobs, 
    xmis, yobs, ...))
impute <- apply(forest, MARGIN = 1, FUN = function(s) sample(unlist(s), 
    1))
return(impute)
}

插補期間使用隨機森林（MICE程序包）時出錯

問題描述

1 個解決方案

解決方案1
2 已采納 2014-06-18 13:43:29

插補期間使用隨機森林（MICE程序包）時出錯

問題描述

1 個解決方案

解決方案1 2 已采納 2014-06-18 13:43:29

解決方案1
2 已采納 2014-06-18 13:43:29