I'm working with a dataframe that has some missing data, and I need to interpolate the empty values, using linear interpolation.
Althoug I know I can do this with a loop, I'd like to do it using dplyr
(for consistency and readibility
and because I know that loops are awfully ugly in R
).
Here's an example of what I am trying to do:
data.raw <- tibble(x=c(66, 67, 68, 69, 70, 72, 73, 75, 93),
S=c(0.11755811, 0.11648940, 0.11542069, 0.11434199,
0.11218459, 0.10996312, 0.10884104, 0.10767071,
0.09228918))
# As you can see, there are some "holes" in the data. For example, the value
# for x = 71 is missing.
# I've created a new dataframe with all the values for x as this:
data.proc <- tibble(x=66:(data.raw %>% select(x) %>% pull() %>% max)) %>%
left_join(data.raw, by='x')
# Here's my non optimal 'for' solution:
for(x_ in data.proc$x) {
if(is.na(data.proc[data.proc$x == x_, 'S'])) {
# Get min and max values for x
x.0 <- max(data.proc[data.proc$x < x_, 'x'])
x.1 <- min(data.proc[data.proc$x > x_, 'x'])
S.0 <- data.proc[data.proc$x == x.0, 'S']
S.1 <- data.proc[data.proc$x == x.1, 'S']
# Calculate the slope
m <- (S.1 - S.0) / (x.1 - x.0)
# Set the new value
data.proc[data.proc$x == x_, 'S'] <- m * (x_ - x.0) + S.0
}
}
So, my question is: Is there a way to do this directly with dplyr
? So far mi google-fu is failing me :(
You can use approx
library(tidyverse)
left_join(tibble(x = seq(min(data.raw$x), max(data.raw$x))), data.raw) %>%
mutate(S = if_else(is.na(S), approx(x, S, x)$y, S))
## A tibble: 28 x 2
# x S
# <dbl> <dbl>
# 1 66 0.118
# 2 67 0.116
# 3 68 0.115
# 4 69 0.114
# 5 70 0.112
# 6 71 0.111
# 7 72 0.110
# 8 73 0.109
# 9 74 0.108
#10 75 0.108
## … with 18 more rows
This assumes that (1) x
is the set of integer values between min(data.raw$x)
and max(data.raw$x)
, and (2) you only want to inter polate values in that interval (not extra polate, in wich case you'd want to use something like lm
).
We can use complete
from tidyr
to fill missing values in x
na.approx
from zoo
to interpolate NA
values in S
.
library(dplyr)
library(tidyr)
data.raw %>% complete(x = seq(min(x), max(x))) %>% mutate(S = zoo::na.approx(S))
# A tibble: 28 x 2
# x S
# <dbl> <dbl>
# 1 66 0.118
# 2 67 0.116
# 3 68 0.115
# 4 69 0.114
# 5 70 0.112
# 6 71 0.111
# 7 72 0.110
# 8 73 0.109
# 9 74 0.108
#10 75 0.108
# … with 18 more rows
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.