简体   繁体   English

在 R fabletools 中使用滞后 xreg 时出现不一致的预测行为

[英]Inconsistent forecast behavior when using lagged xregs in R fabletools

This is a question I opened as an issue but haven't heard from the package author, so I thought I would ask the question here.这是我作为问题打开的一个问题,但没有从包作者那里听到,所以我想我会在这里问这个问题。 Thanks!谢谢!

I am noticing some inconsistencies when forecasting with lagged xregs.在使用滞后 xregs 进行预测时,我注意到一些不一致之处。 Specifically, forecasts for h <= lag period.具体来说,h <= 滞后期的预测。 It seems like the historical data provided to the original model is not added to the new data before generating the forecast.在生成预测之前,似乎没有将提供给原始模型的历史数据添加到新数据中。 In the example below I use the lag = 2 example from fpp3.在下面的示例中,我使用了 fpp3 中的 lag = 2 示例。 The first forecast fc1 is identical to the one generated in the book.第一个预测fc1与书中生成的预测相同。 In the second forecast fc2 I augment the new_data by binding the historical advert data with the new advert data generated in insurance_future .在第二个预测fc2我通过将历史广告数据与在insurance_future生成的新广告数据绑定来增加new_data When I do this I get a different forecast in fc2 vs fc1 .当我这样做时,我在fc2fc1得到了不同的预测。 It seems to me like the forecast in fc1 does not have access to the historical (xreg) data, so the TVaderts is treated as NA for the first two steps in the horizon.在我看来, fc1中的预测无法访问历史 (xreg) 数据,因此 TVaderts 在地平线的前两个步骤中被视为NA Is this correct?这样对吗? And if so, shouldn't that data be included as it is in fc2 ?如果是这样,不应该像fc2那样包含该数据吗? This might be related to.这可能与.

library(fpp3)
#> ── Attaching packages ──────────────────────────────────────────── fpp3 0.4.0 ──
#> ✓ tibble      3.1.2      ✓ tsibble     1.0.1 
#> ✓ dplyr       1.0.6      ✓ tsibbledata 0.3.0 
#> ✓ tidyr       1.1.3      ✓ feasts      0.2.1 
#> ✓ lubridate   1.7.10     ✓ fable       0.3.1 
#> ✓ ggplot2     3.3.3
#> ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
#> x lubridate::date()    masks base::date()
#> x dplyr::filter()      masks stats::filter()
#> x tsibble::intersect() masks base::intersect()
#> x tsibble::interval()  masks lubridate::interval()
#> x dplyr::lag()         masks stats::lag()
#> x tsibble::setdiff()   masks base::setdiff()
#> x tsibble::union()     masks base::union()
library(fabletools)
library(fable)
library(dplyr)
library(tsibble)

fit <- insurance %>%
  # Restrict data so models use same fitting period
  # Estimate models
  model(
    lag2 = ARIMA(Quotes ~ pdq(d = 0) +
                   TVadverts + lag(TVadverts) +
                   lag(TVadverts, 2))
  )

insurance_future <- new_data(insurance, 20) %>%
  mutate(TVadverts = 8)

# Forecast as shown in https://otexts.com/fpp3/lagged-predictors.html
fc1 <- fit %>%
  forecast(insurance_future)

# Manually pre-pend historic advert data to future data to ensure presence of
# lagged regressors
fc2 <- fit %>% 
  forecast(bind_rows(select(insurance, -Quotes), insurance_future)) %>%
  filter_index(as.character(min(insurance_future$Month)) ~ .)

print(fc1)
#> # A fable: 20 x 5 [1M]
#> # Key:     .model [1]
#>    .model    Month      Quotes .mean TVadverts
#>    <chr>     <mth>      <dist> <dbl>     <dbl>
#>  1 lag2   2005 May N(13, 0.23)  13.0         8
#>  2 lag2   2005 Jun N(13, 0.59)  13.0         8
#>  3 lag2   2005 Jul N(13, 0.72)  13.2         8
#>  4 lag2   2005 Aug N(13, 0.72)  13.2         8
#>  5 lag2   2005 Sep N(13, 0.72)  13.2         8
#>  6 lag2   2005 Oct N(13, 0.72)  13.2         8
#>  7 lag2   2005 Nov N(13, 0.72)  13.2         8
#>  8 lag2   2005 Dec N(13, 0.72)  13.2         8
#>  9 lag2   2006 Jan N(13, 0.72)  13.2         8
#> 10 lag2   2006 Feb N(13, 0.72)  13.2         8
#> 11 lag2   2006 Mar N(13, 0.72)  13.2         8
#> 12 lag2   2006 Apr N(13, 0.72)  13.2         8
#> 13 lag2   2006 May N(13, 0.72)  13.2         8
#> 14 lag2   2006 Jun N(13, 0.72)  13.2         8
#> 15 lag2   2006 Jul N(13, 0.72)  13.2         8
#> 16 lag2   2006 Aug N(13, 0.72)  13.2         8
#> 17 lag2   2006 Sep N(13, 0.72)  13.2         8
#> 18 lag2   2006 Oct N(13, 0.72)  13.2         8
#> 19 lag2   2006 Nov N(13, 0.72)  13.2         8
#> 20 lag2   2006 Dec N(13, 0.72)  13.2         8
print(fc2)
#> # A fable: 20 x 5 [1M]
#> # Key:     .model [1]
#>    .model    Month      Quotes .mean TVadverts
#>    <chr>     <mth>      <dist> <dbl>     <dbl>
#>  1 lag2   2005 May N(14, 0.72)  13.5         8
#>  2 lag2   2005 Jun N(13, 0.72)  13.3         8
#>  3 lag2   2005 Jul N(13, 0.72)  13.2         8
#>  4 lag2   2005 Aug N(13, 0.72)  13.2         8
#>  5 lag2   2005 Sep N(13, 0.72)  13.2         8
#>  6 lag2   2005 Oct N(13, 0.72)  13.2         8
#>  7 lag2   2005 Nov N(13, 0.72)  13.2         8
#>  8 lag2   2005 Dec N(13, 0.72)  13.2         8
#>  9 lag2   2006 Jan N(13, 0.72)  13.2         8
#> 10 lag2   2006 Feb N(13, 0.72)  13.2         8
#> 11 lag2   2006 Mar N(13, 0.72)  13.2         8
#> 12 lag2   2006 Apr N(13, 0.72)  13.2         8
#> 13 lag2   2006 May N(13, 0.72)  13.2         8
#> 14 lag2   2006 Jun N(13, 0.72)  13.2         8
#> 15 lag2   2006 Jul N(13, 0.72)  13.2         8
#> 16 lag2   2006 Aug N(13, 0.72)  13.2         8
#> 17 lag2   2006 Sep N(13, 0.72)  13.2         8
#> 18 lag2   2006 Oct N(13, 0.72)  13.2         8
#> 19 lag2   2006 Nov N(13, 0.72)  13.2         8
#> 20 lag2   2006 Dec N(13, 0.72)  13.2         8

waldo::compare(fc1, fc2)
#> `old$Quotes[[1]]$mu`: 13.0
#> `new$Quotes[[1]]$mu`: 13.5
#> 
#> `old$Quotes[[1]]$sigma`: 0.5
#> `new$Quotes[[1]]$sigma`: 0.8
#> 
#> `old$Quotes[[2]]$mu`: 13.0
#> `new$Quotes[[2]]$mu`: 13.3
#> 
#> `old$Quotes[[2]]$sigma`: 0.77
#> `new$Quotes[[2]]$sigma`: 0.85
#> 
#> `old$.mean[1:5]`: 13.0 13.0 13.2 13.2 13.2
#> `new$.mean[1:5]`: 13.5 13.3 13.2 13.2 13.2

Curiously, when I create new lagged variables manually (rather than in the formula) the model results match the "base case" from fpp3 ( fc1 in my example).奇怪的是,当我手动(而不是在公式中)创建新的滞后变量时,模型结果与 fpp3(在我的示例中为fc1 )中的“基本情况”相匹配。

insurance_manlag <- insurance %>%
  mutate(TVadverts1 = lag(TVadverts),
         TVadverts2 = lag(TVadverts, 2))

fit <- insurance_manlag %>%
  # Restrict data so models use same fitting period
  # Estimate models
  model(
    lag2 = ARIMA(Quotes ~ pdq(d = 0) +
                   TVadverts + TVadverts1 + TVadverts2)
  )

insurance_man_future <- append_row(insurance, n = 20) %>%
  replace_na(replace = list(TVadverts = 8)) %>%
  mutate(TVadverts1 = lag(TVadverts),
         TVadverts2 = lag(TVadverts, 2)) %>%
  slice_tail(n = 20)

# Forecast as shown in https://otexts.com/fpp3/lagged-predictors.html
fc3 <- fit %>%
  forecast(insurance_man_future)

waldo::compare(fc1$Quotes, fc3$Quotes)
#> ✓ No differences
waldo::compare(fc2$Quotes, fc3$Quotes)
#> `old[[1]]$mu`: 13.5
#> `new[[1]]$mu`: 13.0
#> 
#> `old[[1]]$sigma`: 0.8
#> `new[[1]]$sigma`: 0.5
#> 
#> `old[[2]]$mu`: 13.3
#> `new[[2]]$mu`: 13.0
#> 
#> `old[[2]]$sigma`: 0.85
#> `new[[2]]$sigma`: 0.77

Created on 2021-06-02 by the reprex package (v2.0.0)reprex 包( v2.0.0 ) 于 2021 年 6 月 2 日创建

This reproduction leads me to believe that fc1 is correct, rather than fc2 .这种再现使我相信fc1是正确的,而不是fc2 If so, what is occurring in fc2 that causes it to have a different forecast vs that in fc1 (and fc3 )?如果是这样, fc2中发生了什么导致它与fc1 (和fc3 )中的预测不同?

In {fable} , models that produce forecasts retain all information needed to produce forecasts.{fable} ,产生预测的模型保留产生预测所需的所有信息。 When using the recommended interface to get fc1 (as shown in the book), the model is holding onto the 2 most recent values of TVadverts .当使用推荐的接口获取fc1 (如书中所示)时,模型会保留TVadverts的 2 个最新值。 While they were not needed to estimate the model, they are required inputs to produce the first couple of forecasts.虽然不需要它们来估计模型,但它们是产生前几个预测所需的输入。

When using the forecast() function with new_data , the intended behaviour is for the model to produce forecasts for each time point in new_data .当将forecast()函数与new_data ,预期行为是让模型为new_data每个时间点生成预测。 I believe forecasting from time points not at the end of the series is not yet implemented, so I will change this to produce an error.我相信尚未实现从不在系列末尾的时间点进行的预测,因此我将对其进行更改以产生错误。

Generally speaking, when using the lag() function in a model formula there is no need to prepend the historical data.一般来说,在模型公式中使用lag()函数时,不需要预先添加历史数据。 The models will store and recall the needed values for forecasting.模型将存储和调用预测所需的值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM