[英]How to FIND missing observations within a time series and fill with NAs
I have a 10 year long time series containing daily observations. 我有一个10年的时间序列,其中包含日常观察。 I've discovered that some of the rows (whole rows, not just observations) from this series are missing, which is problematic for my use case.
我发现该系列中的某些行(整个行,而不仅仅是观察值)丢失了,这对于我的用例来说是有问题的。 The dates are all in order, but a given month may start at (ymd) 2017-10-13 instead of 2017-10-01, thus missing 12 observations.
日期按顺序排列,但给定的月份可能从(ymd)2017-10-13开始而不是2017-10-01,因此缺少12个观测值。 I need to identify where there are interruptions in the sequence like this, and insert the right number of rows with the right dates, so that I can have NAs in those spots.
我需要确定这样的序列中哪些地方有中断,并插入正确数量的行和正确的日期,以便可以在这些位置使用NA。
How can I do this? 我怎样才能做到这一点?
Here is a reproducible example of a dataframe similar to mine, missing 219 of 4018 datestamped observations: 这是一个类似于我的数据框的可复制示例,其中缺少4018个带时间戳的观察结果中的219个:
df <- NULL
df$date <- seq(as.Date("2007/01/01"), as.Date("2017/12/31"), "days")
df$obs <- runif(4018)
df <- as.data.frame(df)
df_missing <- df[sample(1:nrow(df), 3799), ]
head(df_missing)
date obs
1 2007-01-01 0.96428609
2 2007-01-02 0.04199475
3 2007-01-03 0.72729484
4 2007-01-04 0.85591517
5 2007-01-05 0.07373118
6 2007-01-06 0.71093604
Create a data frame with a grid g
of all dates and merge it with your data frame: 创建一个包含所有日期的网格
g
的数据框,并将其与您的数据框合并:
rng <- range(DF$date)
g <- data.frame(date = seq(rng[1], rng[2], "day"))
merge(DF, g, all = TRUE)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.