简体   繁体   中英

Add missing dates with previous values in R - converting quarterly to daily data

I am trying to convert quarterly data into daily data by adding the previous values on the missing dates. This data consists of financial ratios of different stocks. My data has a row label built from two columns: ticker and date. Since I have repetitive dates for each stock, I am not sure how to ignore the ticker and repopulate the missing dates with previous values.

Here is how a small sample of the data looks so far:

> df_new
                   de   eps      pe    ps    pb
APD 2015-09-30  1.373   1.6  21.463 2.772 3.784
APD 2015-12-31  1.325  1.68  21.284 2.893 3.805
APD 2016-03-31  1.411 -2.19  56.114 3.254 4.491
SWKS 2003-03-31 0.402 -0.04    <NA>  <NA>  <NA>
SWKS 2003-06-30 0.397 -0.04  -2.289 1.518 0.929
SWKS 2003-09-30  0.62 -0.29  -2.799 2.046 1.877
SWKS 2003-12-31 0.643  0.03 -25.426 2.045 1.905
SWKS 2004-03-31 0.657 -0.06 -32.004 2.641 2.579
SWKS 2004-06-30 0.584  0.09  -37.18 1.825 1.782
SWKS 2004-09-30 0.555   0.1  65.806 1.881 1.962
SWKS 2004-12-31 0.525  0.09  45.823 1.777 1.912

And I want it to look like this (if weekly):

> df_new
                   de   eps      pe    ps    pb
APD 2015-09-30  1.373   1.6  21.463 2.772 3.784
APD 2015-10-01  1.373   1.6  21.463 2.772 3.784
APD 2015-10-02  1.373   1.6  21.463 2.772 3.784
APD 2015-10-03  1.373   1.6  21.463 2.772 3.784
... 
APD 2015-12-31  1.325  1.68  21.284 2.893 3.805
APD 2016-01-01  1.325  1.68  21.284 2.893 3.805
APD 2016-01-02  1.325  1.68  21.284 2.893 3.805
APD 2016-01-03  1.325  1.68  21.284 2.893 3.805
...
APD 2016-03-31  1.411 -2.19  56.114 3.254 4.491
APD 2016-04-01  1.411 -2.19  56.114 3.254 4.491
APD 2016-04-02  1.411 -2.19  56.114 3.254 4.491
APD 2016-04-03  1.411 -2.19  56.114 3.254 4.491
...
SWKS 2003-03-31 0.402 -0.04    <NA>  <NA>  <NA>
SWKS 2003-04-01 0.402 -0.04    <NA>  <NA>  <NA>
SWKS 2003-04-02 0.402 -0.04    <NA>  <NA>  <NA>
SWKS 2003-04-03 0.402 -0.04    <NA>  <NA>  <NA>
...
SWKS 2003-06-30 0.397 -0.04  -2.289 1.518 0.929
and so on...

I searched for answers and this link: Add missing xts/zoo data with linear interpolation in R is somewhat close to what I want. Though I am not sure what to do with ticker symbol.

Thank you so much for your help!

Use by to apply the anonymous function shown to each symbol's rows. That function produces a grid g of dates and merges it with the original rows of the data frame applying na.locf to fill in NA values. Finally we use do.call("rbind", ...) to put the "by" object produced back together.

library(zoo) # na.locf

df <- do.call("rbind", by(df_new, df_new$symbol, function(x) {
  rng <- range(x$date, na.rm = TRUE)
  g <- data.frame(date = seq(rng[1], rng[2], "day"))
  na.locf(merge(x, g, all = TRUE))
}))

Note: The input df_new in reproducible form is:

Lines <- "
APD 2015-09-30  1.373   1.6  21.463 2.772 3.784
APD 2015-12-31  1.325  1.68  21.284 2.893 3.805
APD 2016-03-31  1.411 -2.19  56.114 3.254 4.491
SWKS 2003-03-31 0.402 -0.04    <NA>  <NA>  <NA>
SWKS 2003-06-30 0.397 -0.04  -2.289 1.518 0.929
SWKS 2003-09-30  0.62 -0.29  -2.799 2.046 1.877
SWKS 2003-12-31 0.643  0.03 -25.426 2.045 1.905
SWKS 2004-03-31 0.657 -0.06 -32.004 2.641 2.579
SWKS 2004-06-30 0.584  0.09  -37.18 1.825 1.782
SWKS 2004-09-30 0.555   0.1  65.806 1.881 1.962
SWKS 2004-12-31 0.525  0.09  45.823 1.777 1.912"
df_new <- read.table(text = Lines, 
   col.names = c("symbol", "date", "de", "eps", "pe", "ps", "pb"))
df_new$date <- as.Date(df_new$date)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM