How to replace NAs with the linear interpolation between known observations?

Question

I have the following data frame,

df <- data.frame(id = c("a", "a", "a", "a", "b", "b", "b", "b"),
        time = 1:4, value = c(100, NA, NA, 550, 300, NA, NA, 900))

Can someone suggest an approach for replacing the NA values in df by dividing the difference of the value column evenly over time? At time 1, A is 100 and at time 4 A is 550. How would one change the NAs in times 2 and 3 to 250 and 400? And then 500 and 700 for B at times 2 and 3?

I can write a complex for loop to brute force it, but is there a more efficient solution?

Answer 1

You could use na.approx from zoo

library(zoo)
df$value <- na.approx(df$value)
df
#  id time value
#1  a    1   100
#2  a    2   250
#3  a    3   400
#4  a    4   550
#5  b    1   300
#6  b    2   500
#7  b    3   700
#8  b    4   900

Answer 2

Or you can create your own vectorized version of na.approx without any complicated loops and solve it without any external packages

myna.approx <- function(x){
  len <- length(x) 
  cumsum(c(x[1L], rep((x[len] - x[1L])/(len - 1L), len - 1L)))
}

with(df, ave(value, id, FUN = myna.approx))
## [1] 100 250 400 550 300 500 700 900

How to replace NAs with the linear interpolation between known observations?

Question

2 answers

solution1
11 ACCPTED 2015-03-17 17:41:22

solution2
6 2015-03-17 18:11:32

How to replace NAs with the linear interpolation between known observations?

Question

2 answers

solution1 11 ACCPTED 2015-03-17 17:41:22

solution2 6 2015-03-17 18:11:32

solution1
11 ACCPTED 2015-03-17 17:41:22

solution2
6 2015-03-17 18:11:32