[英]Interpolate between non-NA observations
Consider observations at irregular snapshots, some of which are NA:考虑对不规则快照的观察,其中一些是不适用的:
library(tidyverse)
library(tweenr)
df <- data.frame(date = c(ymd("20191201"), ymd("20191203"), ymd("20191207"), ymd("20191220")),
value = c(1, 2, NA, 5))
What is the cleanest way to linearly interpolate dates only between observations with non-NA values ?仅在具有非 NA 值的观测值之间线性插值日期的最简洁方法是什么? (In this example since 20191201 and 20191203 have consecutive non-NA values, there should be interpolation) I think somehow using
lead
or lag
. (在这个例子中,因为 20191201 和 20191203 有连续的非 NA 值,应该有插值)我想以某种方式使用
lead
或lag
。 This code interpolates between all values:此代码在所有值之间进行插值:
all_days <- data.frame(date = seq(min(df$date), max(df$date), "day"))
df %>%
arrange(date) %>%
right_join(all_days) %>%
mutate(value = value %>% tween_fill("linear"))
We can create a new column to mark dates that are between non-NA values which we don't want to interpolate ( temp
).我们可以创建一个新列来标记我们不想插入的非 NA 值之间的日期(
temp
)。 Use complete
to fill the missing sequence of dates and fill
the temp
column and use na.approx
to interpolate values.使用
complete
填充缺失的日期序列并fill
temp
列并使用na.approx
插入值。
library(tidyr)
library(zoo)
library(dplyr)
df %>%
mutate(temp = +(!(is.na(value) | lead(is.na(value), default = TRUE)))) %>%
complete(date = seq(min(date), max(date), by = "day")) %>%
fill(temp) %>%
mutate(temp = replace(temp, !is.na(value), 1),
value = na.approx(value) * temp) %>%
na_if(0) %>% select(-temp)
# A tibble: 20 x 2
# date value
# <date> <dbl>
# 1 2019-12-01 1
# 2 2019-12-02 1.5
# 3 2019-12-03 2
# 4 2019-12-04 NA
# 5 2019-12-05 NA
# 6 2019-12-06 NA
# 7 2019-12-07 NA
# 8 2019-12-08 NA
# 9 2019-12-09 NA
#10 2019-12-10 NA
#11 2019-12-11 NA
#12 2019-12-12 NA
#13 2019-12-13 NA
#14 2019-12-14 NA
#15 2019-12-15 NA
#16 2019-12-16 NA
#17 2019-12-17 NA
#18 2019-12-18 NA
#19 2019-12-19 NA
#20 2019-12-20 5
Here is my envisioned solution.这是我设想的解决方案。 The main idea is to create a mask which determines which values will be interpolated.
主要思想是创建一个掩码来确定将插入哪些值。 To create the mask, we mark a row as TRUE if both the row and the next row have non-NA value, then use
complete
and fill
to fill in between.要创建掩码,我们将一行标记为 TRUE,如果该行和下一行都具有非 NA 值,然后使用
complete
和fill
填充它们之间。 To complete the mask we set the last contiguous observation to TRUE.为了完成掩码,我们将最后一个连续观察设置为 TRUE。
df %>%
mutate(has_value = !is.na(value),
mask = lead(has_value, default = FALSE) & has_value) %>%
complete(date = seq(min(date), max(date), by = "day"),
fill = list(has_value = FALSE)) %>%
fill(mask) %>%
mutate(mask = mask | has_value,
value = if_else(mask, value %>% tween_fill("linear"), NA_real_)) %>%
select(-has_value, -mask)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.