Searching for linear interpolation of time series data in R, I often found recommendations to use na.approx()
from the zoo
package.
However, with irregular timeseries I experienced problems, because interpolated values are distributed evenly across the number of gaps, not taking into account the associated time stamp of the value.
I found a work around using approxfun()
, but I wonder whether there is a cleaner solution, ideally based on tsibble
objects with functions from the tidyverts
package family?
Previous answers relied on expanding the irregular date grid to a regular grid by filling the gaps. However, this causes problems when daytime should be taken into account during interpolating.
Here comes a (revised) minimal example with POSIXct timestamp rather than Date only:
library(tidyverse)
library(zoo)
df <- tibble(date = as.POSIXct(c("2000-01-01 00:00", "2000-01-02 02:00", "2000-01-05 00:00")),
value = c(1,NA,2))
df %>%
mutate(value_int_wrong = na.approx(value),
value_int_correct = approxfun(date, value)(date))
# A tibble: 3 x 4
date value value_int_wrong value_int_correct
<dttm> <dbl> <dbl> <dbl>
1 2000-01-01 00:00:00 1 1 1
2 2000-01-02 02:00:00 NA 1.5 1.27
3 2000-01-05 00:00:00 2 2 2
Any ideas how to (efficently) deal with this? Thanks for your support!
Here is an equivalent tsibble-based solution. The interpolate()
function needs a model, but you can use a random walk to give linear interpolation between points.
library(tidyverse)
library(tsibble)
library(fable)
#> Loading required package: fabletools
df <- tibble(
date = as.Date(c("2000-01-01", "2000-01-02", "2000-01-05", "2000-01-06")),
value = c(1, NA, 2, 1.5)
) %>%
as_tsibble(index = date) %>%
fill_gaps()
df %>%
model(naive = ARIMA(value ~ -1 + pdq(0,1,0) + PDQ(0,0,0))) %>%
interpolate(df)
#> # A tsibble: 6 x 2 [1D]
#> date value
#> <date> <dbl>
#> 1 2000-01-01 1
#> 2 2000-01-02 1.25
#> 3 2000-01-03 1.5
#> 4 2000-01-04 1.75
#> 5 2000-01-05 2
#> 6 2000-01-06 1.5
Created on 2020-04-08 by the reprex package (v0.3.0)
Personally, I would go with the solution that you are using but to show how to use na.approx
in this case we can complete
the sequence of dates before using na.approx
and join it with original df
to keep original rows.
library(dplyr)
df %>%
tidyr::complete(date = seq(min(date), max(date), by = "day")) %>%
mutate(value_int = zoo::na.approx(value)) %>%
right_join(df, by = "date") %>%
select(date, value_int)
# date value_int
# <date> <dbl>
#1 2000-01-01 1
#2 2000-01-02 1.25
#3 2000-01-05 2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.