简体   繁体   English

在使用带有 model() 扩展的寓言 package 时,如何将外生变量添加到我的 ARIMA model 估计中

[英]How can I add exogenous variables to my ARIMA model estimation while using fable package with model() extension

I am trying to estimate ARIMA models for 100 different series.我正在尝试估计 100 个不同系列的 ARIMA 模型。 So I employed fabletools::model() method and fable::ARIMA() function to do that job.所以我使用fabletools::model()方法和fable::ARIMA() function 来完成这项工作。 But I couldn't able to use my exogenous variables in model estimation.但是我无法在 model 估计中使用我的外生变量。

My series has 3 different columns, first ID tag identifying the first outlet, then Date.Time tag, and finally the Sales.我的系列有 3 个不同的列,第一个标识第一个出口的 ID 标签,然后是 Date.Time 标签,最后是销售额。 In addition to these variables I also have dummy variables representing hour of day and week of day.除了这些变量之外,我还有代表一天中的小时和一天中的一周的虚拟变量。

虚拟变量

Following the code given bellow I transformed the dataframe which contains my endegounus and exegenous variables to tstibble.按照下面给出的代码,我将包含我的 endegounus 和 exegenous 变量的 dataframe 转换为 tstibble。

 ts_forecast <- df11 %>% select(-Date) %>% mutate(ID = factor(ID)) %>% group_by(ID) %>% as_tsibble(index=Date.Time,key=ID)%>%tsibble::fill_gaps(Sales=0) %>% fabletools::model(Arima = ARIMA(Sales,stepwise = TRUE,xreg=df12))

With this code I try to forecast values for same date.time interval for multiple outlets indentified with ID factor.使用此代码,我尝试为使用 ID 因子标识的多个网点预测相同日期时间间隔的值。 But, The code returns the following error.但是,代码返回以下错误。

>     Could not find an appropriate ARIMA model.
>     This is likely because automatic selection does not select models with characteristic roots that may be numerically unstable.
>     For more details, refer to https://otexts.com/fpp3/arima-r.html#plotting-the-characteristic-roots

Sales are my endogenous target var and df12 includes dummy variables representing hour and day. Sales 是我的内生目标变量,df12 包括代表小时和天的虚拟变量。 Some of the stores don't create sales in some specific hours so their dummy representing 01:00 AM could be equal to zero for all observation.一些商店不会在某些特定时间创造销售,因此它们代表 01:00 AM 的虚拟变量在所有观察中可能等于零。 However I don't think that would be a problem while fable uses stepwise method.但是我认为寓言使用逐步方法时这不会成为问题。 I suppose, when the code sees variable with 0s it can exclude them我想,当代码看到带有 0 的变量时,它可以排除它们

I am not sure what is the problem.我不确定是什么问题。 Am I using problematic way to add xreg to the model (in ARIMA hep page it says xreg= like previous forecast package is OK) or issue is related with the second problem i mentioned dummies including "0" for all observation.我是否使用有问题的方式将 xreg 添加到 model(在 ARIMA hep 页面中它说 xreg= 就像之前的预测 package 是好的)或者问题与我提到的第二个问题有关,包括所有观察的“0”。 If it is the second one there could be solution that can exclude all variables with constant 0 value.如果是第二个,则可能有解决方案可以排除所有具有常量 0 值的变量。

I would be delighted if you can help me.如果你能帮助我,我会很高兴。

Thanks谢谢

Here is an example using hourly pedestrian count data.下面是一个使用每小时行人计数数据的示例。

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tsibble)
library(fable)
#> Loading required package: fabletools

# tsibble with hourly data
df <- pedestrian %>%
  mutate(dow = lubridate::wday(Date, label=TRUE))
# Training data
train <- df %>% 
  filter(Date <= "2015-01-31")
# Fit models
fit <- train %>%
  model(arima = ARIMA(Count ~ season("day") + dow + pdq(2,0,0) + PDQ(0,0,0)))
# Forecast period
fcast_xregs <- df %>%
  filter(Date > "2015-01-31", Date <= "2015-02-07") 
# Forecasts
fit %>% 
  forecast(fcast_xregs)
#> # A fable: 504 x 8 [1h] <Australia/Melbourne>
#> # Key:     Sensor, .model [3]
#>    Sensor .model Date_Time                     Count  .mean Date        Time
#>    <chr>  <chr>  <dttm>                       <dist>  <dbl> <date>     <int>
#>  1 Birra… arima  2015-02-01 00:00:00  N(-67, 174024)  -67.1 2015-02-01     0
#>  2 Birra… arima  2015-02-01 01:00:00 N(-270, 250881) -270.  2015-02-01     1
#>  3 Birra… arima  2015-02-01 02:00:00 N(-286, 310672) -286.  2015-02-01     2
#>  4 Birra… arima  2015-02-01 03:00:00 N(-283, 351704) -283.  2015-02-01     3
#>  5 Birra… arima  2015-02-01 04:00:00 N(-264, 380588) -264.  2015-02-01     4
#>  6 Birra… arima  2015-02-01 05:00:00  N(-244, 4e+05) -244.  2015-02-01     5
#>  7 Birra… arima  2015-02-01 06:00:00 N(-137, 414993) -137.  2015-02-01     6
#>  8 Birra… arima  2015-02-01 07:00:00   N(93, 424929)   93.0 2015-02-01     7
#>  9 Birra… arima  2015-02-01 08:00:00  N(292, 431894)  292.  2015-02-01     8
#> 10 Birra… arima  2015-02-01 09:00:00  N(225, 436775)  225.  2015-02-01     9
#> # … with 494 more rows, and 1 more variable: dow <ord>

Created on 2020-10-09 by the reprex package (v0.3.0)reprex package (v0.3.0) 创建于 2020-10-09

Notes:笔记:

  • You don't need to create dummy variables in R. The formula interface will handle categorical variables appropriately.您不需要在 R 中创建虚拟变量。公式界面将适当地处理分类变量。
  • The season("day") special within ARIMA will generate the appropriate seasonal categorical variable, equivalent to 23 hourly dummy variables. ARIMA中特殊的season("day")将生成适当的季节分类变量,相当于 23 个每小时的虚拟变量。
  • I've specified a specific ARIMA model to save computation time.我指定了一个特定的 ARIMA model 来节省计算时间。 But omit the pdq special to automatically select the optimal model.但省略pdq特殊自动 select 最佳 model。
  • Keep the PDQ(0,0,0) special as you don't need the ARIMA model to handle the seasonality when you are doing that with the exogenous variables.保持PDQ(0,0,0)特殊,因为在使用外生变量时不需要 ARIMA model 来处理季节性。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM