[英]R: replace NA that is preceded and followed by non-na values
I have a huge database that sometimes has missing values that need to be replaced by the average between its preceding and following values. 我有一个巨大的数据库,有时缺少缺少的值,需要用其先前值和后续值之间的平均值替换。 I don´t want to just input the last value if it is NA, but rather to do a simple interpolation using the average.
我不想仅输入最后一个值(如果它是NA),而是使用平均值进行简单的插值。
I have succeeded using two for loops and an if statement: 我已经成功使用了两个for循环和一个if语句:
t2 <- c(0, 0, 0.02, 0.04, NA, NA)
t3 <- c(0, 0, NA, 0, -0.01, 0.03)
t4 <- c(0, -0.02, 0.01, 0, 0, -0.02)
df <- data.frame(t1,t2,t3,t4)
df.save<-df
for(i in 2:nrow(df)){
for(j in 2:ncol(df)){
if(i==1|j==1){
df[i,j]=df[i,j]
} else {
if(is.na(df[i,j])& !is.na(df[i-1,j-1])){
df[i,j]=mean(df[i,j-1],df[i,j+1])
}
}
}
}
df
I am sure this is not efficient at all and not even general - the way I wrote the code I have to start to run my search for NAs from the second rows and columns on. 我确信这根本没有效率,甚至没有通用性-我编写代码的方式必须从第二行和第二列开始对NA进行搜索。 I think lapply could help e here, but I couldn´t achieve anything with that.
我认为lapply可以在这里为您提供帮助,但是我无法实现任何目标。 Any ideas?
有任何想法吗?
EDIT 1 Rui´s answer was perfect but when formulating my example I forgot to consider the case in which two NAs follow each other: 编辑1 Rui的答案是完美的,但是在阐述我的例子时,我忘记考虑两个NA相互跟随的情况:
t2 <- c(0, 0, 0.02, 0.04, NA, NA)
t3 <- c(0, 0, NA, 0, -0.01, 0.03)
t4 <- c(0, -0.02, 0.01, 0, 0, -0.02)
df <- data.frame(t1,t2,t3,t4)
df.save<-df
for(i in 2:nrow(df)){
for(j in 2:ncol(df)){
if(i==1|j==1){
df[i,j]=df[i,j]
} else {
if(is.na(df[i,j])& !is.na(df[i-1,j-1])){
df[i,j]=mean(df[i,j-1],df[i,j+1])
}
}
}
}
df
In this case we get an error 在这种情况下,我们会得到一个错误
Error in rowMeans(cbind(x[prev], x[nxt]), na.rm = TRUE) :
'x' must be numeric
The following function does what the question asks for. 以下功能可完成问题的要求。
meanNA <- function(x){
na <- is.na(x)
prev <- c(na[-1], FALSE)
nxt <- c(FALSE, na[-length(x)])
x[na] <- rowMeans(cbind(x[prev], x[nxt]), na.rm = TRUE)
is.na(x) <- is.nan(x)
x
}
df[] <- lapply(df, meanNA)
df
# t2 t3 t4
#1 0.00 0.00 0.00
#2 0.00 0.00 -0.02
#3 0.02 0.00 0.01
#4 0.04 0.00 0.00
#5 0.04 -0.01 0.00
#6 NA 0.03 -0.02
Using this answer as an example: 以这个答案为例:
df <- t(df.save)
for(i in 2:ncol(df)){
idx <- which(is.na(df[,i]))
idx <- idx[which(idx != 1)]
if(length(idx) > 0){
df[idx, i] <- sapply(idx, function(x) mean(df[x-1,i], df[x+1, i]))
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.