在没有For循环的情况下替换或估算R中的NA值

Question

Is there a better way to go through observations in a data frame and impute NA values? 有没有更好的方法来检查数据帧中的观测值并估算NA值？ I've put together a 'for loop' that seems to do the job, swapping NAs with the row's mean value, but I'm wondering if there is a better approach that does not use a for loop to solve this problem -- perhaps a built in R function? 我已经拼凑了一个似乎可以完成工作的“ for循环”，将NA与该行的平均值交换，但是我想知道是否有更好的方法不使用for循环来解决此问题-也许内置的R函数？

# 1. Create data frame with some NA values. 

rdata <- rbinom(30,5,prob=0.5)
rdata[rdata == 0] <- NA
mtx <- matrix(rdata, 3, 10)
df <- as.data.frame(mtx)  
df2 <- df

# 2. Run for loop to replace NAs with that row's mean.

for(i in 1:3){            # for every row
x <- as.numeric(df[i,])   # subset/extract that row into a numeric vector
y <- is.na(x)             # create logical vector of NAs
z <- !is.na(x)            # create logical vector of non-NAs
result <- mean(x[z])      # get the mean value of the row 
df2[i,y] <- result        # replace NAs in that row
}

# 3. Show output with imputed row mean values.

print(df)  # before
print(df2) # after

Answer 1

Here's a possible vectorized approach (without any loop) 这是一种可能的矢量化方法（无任何循环）

indx <- which(is.na(df), arr.ind = TRUE)
df[indx] <- rowMeans(df, na.rm = TRUE)[indx[,"row"]]

Some explanation 一些解释

We can identify the locations of the NA s using the arr.ind parameter in which . 我们可以识别的位置NA使用S arr.ind参数which 。 Then we can simply index df (by the row and column indexes) and the row means (only by the row indexes) and replace values accordingly 然后我们可以简单地索引df （通过行和列索引）和行均值（仅通过行索引）并相应地替换值

Answer 2

One possibility, using impute from Hmisc , which allows for choosing any function to do imputation, 一种可能性是使用Hmisc impute ，它允许选择任何函数进行插补，

library(Hmisc)
t(sapply(split(df2, row(df2)), impute, fun=mean))

Also, you can hide the loop in an apply 另外，您可以在apply 隐藏循环

t(apply(df2, 1, function(x) {
    mu <- mean(x, na.rm=T)
    x[is.na(x)] <- mu
    x
}))

Answer 3

Data: 数据：

set.seed(102)
rdata <- matrix(rbinom(30,5,prob=0.5),nrow=3)
rdata[cbind(1:3,2:4)] <- NA
df <- as.data.frame(rdata)

This is a little trickier than I'd like -- it relies on the column-major ordering of matrices in R as well as the recycling of the row-means vector to the full length of the matrix. 这比我想要的要复杂一些-它依赖于R中矩阵的列主要排序以及行均值向量到矩阵全长的循环。 I tried to come up with a sweep() solution but didn't manage so far. 我试图提出一个sweep()解决方案，但到目前为止还没有解决。

rmeans <- rowMeans(df,na.rm=TRUE)
df[] <- ifelse(is.na(df),rmeans,as.matrix(df))

在没有For循环的情况下替换或估算R中的NA值

问题描述

3 个解决方案

解决方案1
5 已采纳 2015-08-12 21:04:16

解决方案2
3 2015-08-12 20:59:53

解决方案3
3 2015-08-12 21:04:02

在没有For循环的情况下替换或估算R中的NA值

问题描述

3 个解决方案

解决方案1 5 已采纳 2015-08-12 21:04:16

解决方案2 3 2015-08-12 20:59:53

解决方案3 3 2015-08-12 21:04:02

解决方案1
5 已采纳 2015-08-12 21:04:16

解决方案2
3 2015-08-12 20:59:53

解决方案3
3 2015-08-12 21:04:02