用 R 插值不规则时间序列

Question

Searching for linear interpolation of time series data in R, I often found recommendations to use na.approx() from the zoo package.在 R 中搜索时间序列数据的线性插值时，我经常从zoo package 中找到使用na.approx()的建议。

However, with irregular timeseries I experienced problems, because interpolated values are distributed evenly across the number of gaps, not taking into account the associated time stamp of the value.但是，对于不规则的时间序列，我遇到了问题，因为插值均匀分布在间隙的数量上，而不考虑值的关联时间戳。

I found a work around using approxfun() , but I wonder whether there is a cleaner solution, ideally based on tsibble objects with functions from the tidyverts package family?我找到了使用approxfun()的工作，但我想知道是否有更清洁的解决方案，理想情况下基于tsibble对象，并具有来自tidyverts package 系列的功能？

Previous answers relied on expanding the irregular date grid to a regular grid by filling the gaps.以前的答案依赖于通过填补空白将不规则日期网格扩展到规则网格。 However, this causes problems when daytime should be taken into account during interpolating.但是，当在插值期间应考虑白天时，这会导致问题。

Here comes a (revised) minimal example with POSIXct timestamp rather than Date only:这是一个（修订后的）最小示例，带有 POSIXct 时间戳，而不是仅日期：

library(tidyverse)
library(zoo)

df <- tibble(date = as.POSIXct(c("2000-01-01 00:00", "2000-01-02 02:00", "2000-01-05 00:00")),
             value = c(1,NA,2))

df %>% 
  mutate(value_int_wrong = na.approx(value),
         value_int_correct = approxfun(date, value)(date))

# A tibble: 3 x 4
  date                value value_int_wrong value_int_correct
  <dttm>              <dbl>           <dbl>             <dbl>
1 2000-01-01 00:00:00     1             1                1   
2 2000-01-02 02:00:00    NA             1.5              1.27
3 2000-01-05 00:00:00     2             2                2

Any ideas how to (efficently) deal with this?任何想法如何（有效地）处理这个？ Thanks for your support!谢谢你的支持！

Answer 1

Here is an equivalent tsibble-based solution.这是一个等效的基于 tsibble 的解决方案。 The interpolate() function needs a model, but you can use a random walk to give linear interpolation between points. interpolate() function 需要 model，但您可以使用随机游走在点之间进行线性插值。

library(tidyverse)
library(tsibble)
library(fable)
#> Loading required package: fabletools

df <- tibble(
  date = as.Date(c("2000-01-01", "2000-01-02", "2000-01-05", "2000-01-06")),
  value = c(1, NA, 2, 1.5)
) %>%
  as_tsibble(index = date) %>%
  fill_gaps()

df %>%
  model(naive = ARIMA(value ~ -1 + pdq(0,1,0) + PDQ(0,0,0))) %>%
  interpolate(df)
#> # A tsibble: 6 x 2 [1D]
#>   date       value
#>   <date>     <dbl>
#> 1 2000-01-01  1   
#> 2 2000-01-02  1.25
#> 3 2000-01-03  1.5 
#> 4 2000-01-04  1.75
#> 5 2000-01-05  2   
#> 6 2000-01-06  1.5

^{Created on 2020-04-08 by the reprex package (v0.3.0)}^{由reprex package (v0.3.0) 于 2020 年 4 月 8 日创建}

Answer 2

Personally, I would go with the solution that you are using but to show how to use na.approx in this case we can complete the sequence of dates before using na.approx and join it with original df to keep original rows.就个人而言，我将 go 与您正在使用的解决方案一起使用，但为了展示如何使用na.approx在这种情况下，我们可以在使用na.approx之前complete日期序列并将其与原始df连接以保留原始行。

library(dplyr)

df %>% 
  tidyr::complete(date = seq(min(date), max(date), by = "day")) %>%
  mutate(value_int = zoo::na.approx(value)) %>%
  right_join(df, by = "date") %>%
  select(date, value_int)


#  date       value_int
#  <date>         <dbl>
#1 2000-01-01      1   
#2 2000-01-02      1.25
#3 2000-01-05      2

用 R 插值不规则时间序列

问题描述

2 个解决方案

解决方案1
4 2020-04-07 23:28:59

解决方案2
0 2020-04-07 11:05:26

用 R 插值不规则时间序列

问题描述

2 个解决方案

解决方案1 4 2020-04-07 23:28:59

解决方案2 0 2020-04-07 11:05:26

解决方案1
4 2020-04-07 23:28:59

解决方案2
0 2020-04-07 11:05:26