简体   繁体   中英

How to replace NAs with the linear interpolation between known observations?

I have the following data frame,

df <- data.frame(id = c("a", "a", "a", "a", "b", "b", "b", "b"),
        time = 1:4, value = c(100, NA, NA, 550, 300, NA, NA, 900))

Can someone suggest an approach for replacing the NA values in df by dividing the difference of the value column evenly over time? At time 1, A is 100 and at time 4 A is 550. How would one change the NAs in times 2 and 3 to 250 and 400? And then 500 and 700 for B at times 2 and 3?

I can write a complex for loop to brute force it, but is there a more efficient solution?

You could use na.approx from zoo

library(zoo)
df$value <- na.approx(df$value)
df
#  id time value
#1  a    1   100
#2  a    2   250
#3  a    3   400
#4  a    4   550
#5  b    1   300
#6  b    2   500
#7  b    3   700
#8  b    4   900

Or you can create your own vectorized version of na.approx without any complicated loops and solve it without any external packages

myna.approx <- function(x){
  len <- length(x) 
  cumsum(c(x[1L], rep((x[len] - x[1L])/(len - 1L), len - 1L)))
}

with(df, ave(value, id, FUN = myna.approx))
## [1] 100 250 400 550 300 500 700 900

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM