Interpolate values in a dataframe column inplace using dplyr

Question

I'm working with a dataframe that has some missing data, and I need to interpolate the empty values, using linear interpolation.

Althoug I know I can do this with a loop, I'd like to do it using dplyr (for consistency and readibility ~~and because I know that loops are awfully ugly in R~~ ).

Here's an example of what I am trying to do:

data.raw <- tibble(x=c(66, 67, 68, 69, 70, 72, 73, 75, 93), 
                   S=c(0.11755811, 0.11648940, 0.11542069, 0.11434199, 
                       0.11218459, 0.10996312, 0.10884104, 0.10767071, 
                       0.09228918))
# As you can see, there are some "holes" in the data. For example, the value
# for x = 71 is missing.

# I've created a new dataframe with all the values for x as this:
data.proc <- tibble(x=66:(data.raw %>% select(x) %>% pull() %>% max)) %>% 
  left_join(data.raw, by='x')

# Here's my non optimal 'for' solution:
for(x_ in data.proc$x) {
  if(is.na(data.proc[data.proc$x == x_, 'S'])) {
    # Get min and max values for x
    x.0 <- max(data.proc[data.proc$x < x_, 'x'])
    x.1 <- min(data.proc[data.proc$x > x_, 'x'])
    S.0 <- data.proc[data.proc$x == x.0, 'S']
    S.1 <- data.proc[data.proc$x == x.1, 'S']
    # Calculate the slope
    m <- (S.1 - S.0) / (x.1 - x.0)
    # Set the new value
    data.proc[data.proc$x == x_, 'S'] <- m * (x_ - x.0) + S.0
  }
}

So, my question is: Is there a way to do this directly with dplyr ? So far mi google-fu is failing me :(

Answer 1

You can use approx

library(tidyverse)
left_join(tibble(x = seq(min(data.raw$x), max(data.raw$x))), data.raw) %>%
    mutate(S = if_else(is.na(S), approx(x, S, x)$y, S))
## A tibble: 28 x 2
#       x     S
#   <dbl> <dbl>
# 1    66 0.118
# 2    67 0.116
# 3    68 0.115
# 4    69 0.114
# 5    70 0.112
# 6    71 0.111
# 7    72 0.110
# 8    73 0.109
# 9    74 0.108
#10    75 0.108
## … with 18 more rows

This assumes that (1) x is the set of integer values between min(data.raw$x) and max(data.raw$x) , and (2) you only want to inter polate values in that interval (not extra polate, in wich case you'd want to use something like lm ).

Answer 2

We can use complete from tidyr to fill missing values in x na.approx from zoo to interpolate NA values in S .

library(dplyr)
library(tidyr)

data.raw %>% complete(x = seq(min(x), max(x))) %>% mutate(S = zoo::na.approx(S))

# A tibble: 28 x 2
#      x     S
#   <dbl> <dbl>
# 1    66 0.118
# 2    67 0.116
# 3    68 0.115
# 4    69 0.114
# 5    70 0.112
# 6    71 0.111
# 7    72 0.110
# 8    73 0.109
# 9    74 0.108
#10    75 0.108
# … with 18 more rows

Interpolate values in a dataframe column inplace using dplyr

Question

2 answers

solution1
3 ACCPTED 2020-01-16 00:25:35

solution2
1 2020-01-16 01:02:17

Interpolate values in a dataframe column inplace using dplyr

Question

2 answers

solution1 3 ACCPTED 2020-01-16 00:25:35

solution2 1 2020-01-16 01:02:17

solution1
3 ACCPTED 2020-01-16 00:25:35

solution2
1 2020-01-16 01:02:17