简体   繁体   中英

How to FIND missing observations within a time series and fill with NAs

I have a 10 year long time series containing daily observations. I've discovered that some of the rows (whole rows, not just observations) from this series are missing, which is problematic for my use case. The dates are all in order, but a given month may start at (ymd) 2017-10-13 instead of 2017-10-01, thus missing 12 observations. I need to identify where there are interruptions in the sequence like this, and insert the right number of rows with the right dates, so that I can have NAs in those spots.

How can I do this?

Here is a reproducible example of a dataframe similar to mine, missing 219 of 4018 datestamped observations:

df <- NULL
df$date <- seq(as.Date("2007/01/01"), as.Date("2017/12/31"), "days")
df$obs <- runif(4018)
df <- as.data.frame(df)
df_missing <- df[sample(1:nrow(df), 3799), ]

head(df_missing)
        date        obs
    1 2007-01-01 0.96428609
    2 2007-01-02 0.04199475
    3 2007-01-03 0.72729484
    4 2007-01-04 0.85591517
    5 2007-01-05 0.07373118
    6 2007-01-06 0.71093604

Create a data frame with a grid g of all dates and merge it with your data frame:

rng <- range(DF$date)
g <- data.frame(date = seq(rng[1], rng[2], "day"))
merge(DF, g, all = TRUE)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM