简体   繁体   中英

Imputation based on factor level

I am trying to perform imputation with MICE on the following dataframe.

marketValue <- c(NA, 234234, NA, 243243, NA, NA, 234523, NA, 232427, 112214)
bathrooms <- c(3,3,2,3,5,4,1,5,6,3)
garageSqFt <- c(400, 385, 454, 534, 210, NA, 342, 423, 535, NA)
totalSqFT <- c(NA, NA, 1231, 2232, 4564, 2122, 4324, 4342, 1299, 4355)
units <- c(1, 1, 1, 1, 1, 1, 1.5, NA, 2, 5)
subDivId <- c("112", "111", "111", "111", "112", "111", "112", "112", "111", 
"112")
data <- data.frame(marketValue, bathrooms, garageSqFt, totalSqFT, units, 
subDivId)

In the actual data frame there are about 1300 subDivId factor levels and I would like to create new data frames (each of which have rows all with the same subDivId) then impute within each data frame. My attempt

splitSubDiv <- split(data, data$subDivId) 
for (neighborhood in splitSubDiv){
    testData <- mice(neighborhood, m=5, maxit = 5) 
    impData <- complete(testData, 5) 
    str(impData) }

This doesn't seem to work. No imputation is being done, things are still NA. What am I doing Wrong?

In your attempt you are not assigning your imputed data frames to objects, so the results are not preserved.

You can try this approach using lapply instead of a for loop. You'll end up with a list of imputed data frames.

splitSubDivImp <- lapply(splitSubDiv, function(neighborhood) {
  testData <- mice(neighborhood, m = 5, maxit = 5) 
  impData <- complete(testData, 5)
})

splitSubDivImp

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM