简体   繁体   English

R时间序列内插和特定值的外插

[英]R time series interpolation, and extrapolation of a specific value

I have daily values for 11 different yield curves, that is time series for 11 yield maturities (1yr, 2yr, 3yr, 4yr, 5yr, 7yr, 10yr, 15yr, 20yr, 25yr, 30yr) in the same period of time. 我有11条不同收益率曲线的日值,即在同一时间段内11个收益率到期的时间序列(1年,2年,3年,4年,5年,7年,10年,15年,20年,25年,30年)。 Some of the yields in some days are missing (NAs) and I'd like to extrapolate their values knowing the value of the other yields at the same day. 某些日子里的某些收益率会丢失(NA),我想推断出它们的值,因为他们知道同一天其他收益率的价值。 This should be done by a first linear interpolation of the available yields in a given day, and a successive extrapolation of the missing yields in the same day, using the maturity duration (1yr, 2yr, etc) as weight. 这应通过使用给定日期的到期期限(1年,2年等)对给定日期的可用收益率进行第一次线性内插,然后在同一天对缺失的收益率进行连续外推来完成。 For example, I have the following data set and I'd like to extrapolate the daily value for 5yr yield based on an interpolation of all available yilds at the same day: 例如,我有以下数据集,并且我想基于同一天所有可用收益的插值推算5年收益率的每日值:

Date      1     2     3     4  5  7     10    15    20    25 30
7/4/2007  9.642 9.278 8.899 NA NA 8.399 8.241 8.183 8.117 NA NA
7/5/2007  9.669 9.302 8.931 NA NA 8.44  8.287 8.231 8.118 NA NA
7/6/2007  9.698 9.331 8.961 NA NA 8.437 8.295 8.243 8.13  NA NA
7/9/2007  9.678 9.306 8.941 NA NA 8.409 8.269 8.214 8.092 NA NA
7/10/2007 9.65  9.283 8.915 NA NA 8.385 8.243 8.185 8.065 NA NA
7/11/2007 9.7   9.342 8.976 NA NA 8.445 8.306 8.249 8.138 NA NA
7/12/2007 9.703 9.348 8.975 NA NA 8.448 8.303 8.245 8.152 NA NA
7/13/2007 9.69  9.334 8.965 NA NA 8.439 8.294 8.24  8.145 NA NA
7/16/2007 9.683 9.325 8.964 NA NA 8.442 8.299 8.244 8.158 NA NA
7/17/2007 9.712 9.359 8.987 NA NA 8.481 8.33  8.277 8.192 NA NA
7/18/2007 9.746 9.394 9.018 NA NA 8.509 8.363 8.311 8.22  NA NA
...

Does anyone have suggestions on how to do it? 有人对如何做有建议吗? Thanks. 谢谢。

This is one of the ways to build a linear model for each Date based on the available info you have and use it to predict/estimate the value at year 5. 这是根据您拥有的可用信息为每个Date构建线性模型,并用来预测/估计第5年的值的方法之一。

Run the process step by step to see how it works. 逐步运行该过程以查看其工作方式。 Check the estimations to make sure they make sense. 检查估计以确保它们有意义。

dt = read.table(text=
"Date      1     2     3     4  5  7     10    15    20    25 30
7/4/2007  9.642 9.278 8.899 NA NA 8.399 8.241 8.183 8.117 NA NA
7/5/2007  9.669 9.302 8.931 NA NA 8.44  8.287 8.231 8.118 NA NA
7/6/2007  9.698 9.331 8.961 NA NA 8.437 8.295 8.243 8.13  NA NA
7/9/2007  9.678 9.306 8.941 NA NA 8.409 8.269 8.214 8.092 NA NA
7/10/2007 9.65  9.283 8.915 NA NA 8.385 8.243 8.185 8.065 NA NA
7/11/2007 9.7   9.342 8.976 NA NA 8.445 8.306 8.249 8.138 NA NA
7/12/2007 9.703 9.348 8.975 NA NA 8.448 8.303 8.245 8.152 NA NA
7/13/2007 9.69  9.334 8.965 NA NA 8.439 8.294 8.24  8.145 NA NA
7/16/2007 9.683 9.325 8.964 NA NA 8.442 8.299 8.244 8.158 NA NA
7/17/2007 9.712 9.359 8.987 NA NA 8.481 8.33  8.277 8.192 NA NA
7/18/2007 9.746 9.394 9.018 NA NA 8.509 8.363 8.311 8.22  NA NA", header=T)


library(dplyr)
library(tidyr)


dt %>%
  gather(time, value, -Date) %>%                               # reshape dataset
  filter(!is.na(value)) %>%                                    # ignore NA values
  mutate(time = as.numeric(gsub("X","",time))) %>%             # get rid of the X created by importing data
  group_by(Date) %>%                                           # for each date
  do({model = lm(value~time, data=.)                              # build a linear model
      data.frame(pred = predict(model, data.frame(time=5)))})     # use model to predict at time = 5


# Source: local data frame [11 x 2]
# Groups: Date [11]
# 
#          Date     pred
#        (fctr)    (dbl)
# 1  7/10/2007 8.920932
# 2  7/11/2007 8.979601
# 3  7/12/2007 8.981383
# 4  7/13/2007 8.970571
# 5  7/16/2007 8.968542
# 6  7/17/2007 8.999584
# 7  7/18/2007 9.032026
# 8   7/4/2007 8.917645
# 9   7/5/2007 8.950605
# 10  7/6/2007 8.970669
# 11  7/9/2007 8.946661

I'm not suggesting that the linear model is the best fit, as I didn't spend time on checking that. 我不建议线性模型是最合适的,因为我没有花时间检查它。 But, you can use a quadratic model instead of a linear, which might give you a better estimation. 但是,您可以使用二次模型而不是线性模型,这样可以为您提供更好的估计。

In case you want to check the model output and get info about the models you built and used for each Date you can do this: 如果您要检查模型输出并获取有关您为每个Date构建和使用的模型的信息,可以执行以下操作:

library(dplyr)
library(tidyr)
library(broom)


dt %>%
  gather(time, value, -Date) %>%                               # reshape dataset
  filter(!is.na(value)) %>%                                    # ignore NA values
  mutate(time = as.numeric(gsub("X","",time))) %>%             # get rid of the X created by importing data
  group_by(Date) %>%                                           # for each date
  do({model = lm(value~time, data=.)                              # build a linear model
      tidy(model)})                                               # check model output


# Source: local data frame [22 x 6]
# Groups: Date [11]
# 
#         Date        term    estimate  std.error statistic      p.value
#       (fctr)       (chr)       (dbl)      (dbl)     (dbl)        (dbl)
# 1  7/10/2007 (Intercept)  9.29495818 0.19895389 46.719158 8.485928e-08
# 2  7/10/2007        time -0.07480530 0.01875160 -3.989275 1.043399e-02
# 3  7/11/2007 (Intercept)  9.34942937 0.19823019 47.164509 8.093526e-08
# 4  7/11/2007        time -0.07396561 0.01868339 -3.958897 1.075469e-02
# 5  7/12/2007 (Intercept)  9.35001022 0.20037595 46.662337 8.537618e-08
# 6  7/12/2007        time -0.07372537 0.01888563 -3.903781 1.136592e-02
# 7  7/13/2007 (Intercept)  9.33730855 0.19974786 46.745476 8.462114e-08
# 8  7/13/2007        time -0.07334758 0.01882643 -3.895989 1.145551e-02
# 9  7/16/2007 (Intercept)  9.33045446 0.19856561 46.989276 8.245272e-08
# 10 7/16/2007        time -0.07238243 0.01871501 -3.867615 1.178869e-02
# ..       ...         ...         ...        ...       ...          ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM