简体   繁体   English

在R中使用以前的值添加缺失的日期-将季度数据转换为每日数据

[英]Add missing dates with previous values in R - converting quarterly to daily data

I am trying to convert quarterly data into daily data by adding the previous values on the missing dates. 我正在尝试通过在缺失的日期上添加先前的值来将季度数据转换为每日数据。 This data consists of financial ratios of different stocks. 该数据由不同股票的财务比率组成。 My data has a row label built from two columns: ticker and date. 我的数据有一个由两列构成的行标签:报价器和日期。 Since I have repetitive dates for each stock, I am not sure how to ignore the ticker and repopulate the missing dates with previous values. 由于我对每只股票都有重复的日期,因此我不确定如何忽略此行情自动收录器,并使用先前的值重新填充缺少的日期。

Here is how a small sample of the data looks so far: 到目前为止,这是一小部分数据的外观:

> df_new
                   de   eps      pe    ps    pb
APD 2015-09-30  1.373   1.6  21.463 2.772 3.784
APD 2015-12-31  1.325  1.68  21.284 2.893 3.805
APD 2016-03-31  1.411 -2.19  56.114 3.254 4.491
SWKS 2003-03-31 0.402 -0.04    <NA>  <NA>  <NA>
SWKS 2003-06-30 0.397 -0.04  -2.289 1.518 0.929
SWKS 2003-09-30  0.62 -0.29  -2.799 2.046 1.877
SWKS 2003-12-31 0.643  0.03 -25.426 2.045 1.905
SWKS 2004-03-31 0.657 -0.06 -32.004 2.641 2.579
SWKS 2004-06-30 0.584  0.09  -37.18 1.825 1.782
SWKS 2004-09-30 0.555   0.1  65.806 1.881 1.962
SWKS 2004-12-31 0.525  0.09  45.823 1.777 1.912

And I want it to look like this (if weekly): 我希望它看起来像这样(如果是每周一次):

> df_new
                   de   eps      pe    ps    pb
APD 2015-09-30  1.373   1.6  21.463 2.772 3.784
APD 2015-10-01  1.373   1.6  21.463 2.772 3.784
APD 2015-10-02  1.373   1.6  21.463 2.772 3.784
APD 2015-10-03  1.373   1.6  21.463 2.772 3.784
... 
APD 2015-12-31  1.325  1.68  21.284 2.893 3.805
APD 2016-01-01  1.325  1.68  21.284 2.893 3.805
APD 2016-01-02  1.325  1.68  21.284 2.893 3.805
APD 2016-01-03  1.325  1.68  21.284 2.893 3.805
...
APD 2016-03-31  1.411 -2.19  56.114 3.254 4.491
APD 2016-04-01  1.411 -2.19  56.114 3.254 4.491
APD 2016-04-02  1.411 -2.19  56.114 3.254 4.491
APD 2016-04-03  1.411 -2.19  56.114 3.254 4.491
...
SWKS 2003-03-31 0.402 -0.04    <NA>  <NA>  <NA>
SWKS 2003-04-01 0.402 -0.04    <NA>  <NA>  <NA>
SWKS 2003-04-02 0.402 -0.04    <NA>  <NA>  <NA>
SWKS 2003-04-03 0.402 -0.04    <NA>  <NA>  <NA>
...
SWKS 2003-06-30 0.397 -0.04  -2.289 1.518 0.929
and so on...

I searched for answers and this link: Add missing xts/zoo data with linear interpolation in R is somewhat close to what I want. 我搜索了答案和此链接: 在R中使用线性插值添加丢失的xts / zoo数据与我想要的有点相似 Though I am not sure what to do with ticker symbol. 虽然我不确定如何处理股票代码。

Thank you so much for your help! 非常感谢你的帮助!

Use by to apply the anonymous function shown to each symbol's rows. 使用by将显示的匿名函数应用于每个符号的行。 That function produces a grid g of dates and merges it with the original rows of the data frame applying na.locf to fill in NA values. 该函数将生成一个日期网格g ,并将其与数据框的原始行合并,使用na.locf来填充NA值。 Finally we use do.call("rbind", ...) to put the "by" object produced back together. 最后,我们使用do.call("rbind", ...)将生成的"by"对象放回原处。

library(zoo) # na.locf

df <- do.call("rbind", by(df_new, df_new$symbol, function(x) {
  rng <- range(x$date, na.rm = TRUE)
  g <- data.frame(date = seq(rng[1], rng[2], "day"))
  na.locf(merge(x, g, all = TRUE))
}))

Note: The input df_new in reproducible form is: 注意:以可复制形式输入的df_new为:

Lines <- "
APD 2015-09-30  1.373   1.6  21.463 2.772 3.784
APD 2015-12-31  1.325  1.68  21.284 2.893 3.805
APD 2016-03-31  1.411 -2.19  56.114 3.254 4.491
SWKS 2003-03-31 0.402 -0.04    <NA>  <NA>  <NA>
SWKS 2003-06-30 0.397 -0.04  -2.289 1.518 0.929
SWKS 2003-09-30  0.62 -0.29  -2.799 2.046 1.877
SWKS 2003-12-31 0.643  0.03 -25.426 2.045 1.905
SWKS 2004-03-31 0.657 -0.06 -32.004 2.641 2.579
SWKS 2004-06-30 0.584  0.09  -37.18 1.825 1.782
SWKS 2004-09-30 0.555   0.1  65.806 1.881 1.962
SWKS 2004-12-31 0.525  0.09  45.823 1.777 1.912"
df_new <- read.table(text = Lines, 
   col.names = c("symbol", "date", "de", "eps", "pe", "ps", "pb"))
df_new$date <- as.Date(df_new$date)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM