简体   繁体   English

使用 ARIMA 和 tsibble 预测不规则库存数据

[英]Forecasting irregular stock data with ARIMA and tsibble

I want to forecast a certain stock using ARIMA in a similar way that R. Hyndman does it in FPP3 .我想以与 R. Hyndman 在FPP3中类似的方式使用 ARIMA 预测某只股票。

The first issue that I've run into is that stock data is obviously irregular, since the stock exchange is closed during weekends and some holidays.我遇到的第一个问题是股票数据明显不规则,因为证券交易所在周末和一些节假日休市。 This creates some issues if I want to use functions from the tidyverts packages:如果我想使用 tidyverts 包中的函数,这会产生一些问题:

> stock
# A tsibble: 750 x 6 [1D]
   Date        Open  High   Low Close Volume
   <date>     <dbl> <dbl> <dbl> <dbl>  <dbl>
 1 2019-05-21  36.3  36.4  36.3  36.4    232
 2 2019-05-22  36.4  37.0  36.4  36.8   1007
 3 2019-05-23  36.7  36.8  36.1  36.1   4298
 4 2019-05-24  36.4  36.5  36.4  36.4    452
 5 2019-05-27  36.5  36.5  36.3  36.4   2032
 6 2019-05-28  36.5  36.8  36.4  36.5   3049
 7 2019-05-29  36.2  36.5  36.1  36.5   2962
 8 2019-05-30  36.8  37.1  36.8  37.1    432
 9 2019-05-31  36.8  37.4  36.8  37.4   8424
10 2019-06-03  37.3  37.5  37.2  37.3   1550
# ... with 740 more rows


> stock %>%
+ feasts::ACF(difference(Close)) %>%
+ autoplot()

Error in `check_gaps()`:
! .data contains implicit gaps in time. You should check your data and convert implicit gaps into explicit missing values using `tsibble::fill_gaps()` if required.

The same error regarding gaps in time applies to other functions like fable::ARIMA() or feasts::gg_tsdisplay().关于时间间隔的相同错误适用于其他函数,如 fable::ARIMA() 或 feasts::gg_tsdisplay()。

I have tried filling the gaps with values from previous rows:我尝试用前几行的值填补空白:

stock %>%
  group_by_key() %>%
  fill_gaps() %>%
  tidyr::fill(Close, .direction = "down")

# A tsibble: 1,096 x 6 [1D]
   Date        Open  High   Low Close Volume
   <date>     <dbl> <dbl> <dbl> <dbl>  <dbl>
 1 2019-05-21  36.3  36.4  36.3  36.4    232
 2 2019-05-22  36.4  37.0  36.4  36.8   1007
 3 2019-05-23  36.7  36.8  36.1  36.1   4298
 4 2019-05-24  36.4  36.5  36.4  36.4    452
 5 2019-05-25  NA    NA    NA    36.4     NA
 6 2019-05-26  NA    NA    NA    36.4     NA
 7 2019-05-27  36.5  36.5  36.3  36.4   2032
 8 2019-05-28  36.5  36.8  36.4  36.5   3049
 9 2019-05-29  36.2  36.5  36.1  36.5   2962
10 2019-05-30  36.8  37.1  36.8  37.1    432
# ... with 1,086 more rows

and everything works as it should from there.从那里开始一切正常。 My question is:我的问题是:

  • Is there a way to use the "tidyverts approach" without running into the issue regarding gaps in time?有没有办法使用“tidyverts 方法”而不会遇到有关时间间隔的问题?
  • If not, is filling the gaps with values from previous rows a correct way to overcome this or will it bias the model?如果不是,那么用前几行的值填补空白是否是克服这一问题的正确方法,还是会使模型产生偏差?

First, you're clearly using an old version of the feasts package, because the current version gives a warning rather than an error when computing the ACF from data with implicit gaps.首先,您显然使用的是旧版本的 feasts 包,因为当前版本在从具有隐式间隙的数据计算 ACF 时会发出警告而不是错误。

Second, the answer depends on what analysis you want to do.其次,答案取决于您要进行的分析。 You have three choices:你有三个选择:

  1. use day as the time index and fill the gaps with NAs;使用天作为时间索引,并用 NA 填补空白;
  2. use day as the time index and fill the gaps with the previous closing stock prices;以日为时间指标,补上前收盘价;
  3. use trading day as the time index, in which case there are no gaps.使用交易日作为时间指标,在这种情况下没有跳空。

Here are the results for each of them, using an example of Apple stock over the period 2014-2018.以下是他们每个人的结果,以 2014-2018 年期间的苹果股票为例。

library(fpp3)
#> ── Attaching packages ─────────────────────────────────────── fpp3 0.4.0.9000 ──
#> ✔ tibble      3.1.7     ✔ tsibble     1.1.1
#> ✔ dplyr       1.0.9     ✔ tsibbledata 0.4.0
#> ✔ tidyr       1.2.0     ✔ feasts      0.2.2
#> ✔ lubridate   1.8.0     ✔ fable       0.3.1
#> ✔ ggplot2     3.3.6     ✔ fabletools  0.3.2
#> ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
#> ✖ lubridate::date()    masks base::date()
#> ✖ dplyr::filter()      masks stats::filter()
#> ✖ tsibble::intersect() masks base::intersect()
#> ✖ tsibble::interval()  masks lubridate::interval()
#> ✖ dplyr::lag()         masks stats::lag()
#> ✖ tsibble::setdiff()   masks base::setdiff()
#> ✖ tsibble::union()     masks base::union()

1. Fill non-trading days with missing values 1. 用缺失值填充非交易日

stock <- gafa_stock %>%
  filter(Symbol == "AAPL") %>%
  tsibble(index = Date, regular = TRUE) %>%
  fill_gaps()
stock
#> # A tsibble: 1,825 x 8 [1D]
#>    Symbol Date        Open  High   Low Close Adj_Close    Volume
#>    <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>     <dbl>
#>  1 AAPL   2014-01-02  79.4  79.6  78.9  79.0      67.0  58671200
#>  2 AAPL   2014-01-03  79.0  79.1  77.2  77.3      65.5  98116900
#>  3 <NA>   2014-01-04  NA    NA    NA    NA        NA          NA
#>  4 <NA>   2014-01-05  NA    NA    NA    NA        NA          NA
#>  5 AAPL   2014-01-06  76.8  78.1  76.2  77.7      65.9 103152700
#>  6 AAPL   2014-01-07  77.8  78.0  76.8  77.1      65.4  79302300
#>  7 AAPL   2014-01-08  77.0  77.9  77.0  77.6      65.8  64632400
#>  8 AAPL   2014-01-09  78.1  78.1  76.5  76.6      65.0  69787200
#>  9 AAPL   2014-01-10  77.1  77.3  75.9  76.1      64.5  76244000
#> 10 <NA>   2014-01-11  NA    NA    NA    NA        NA          NA
#> # … with 1,815 more rows

stock %>%
  model(ARIMA(Close ~ pdq(d=1)))
#> A mable: 1 x 1
#>  `ARIMA(Close ~ pdq(d = 1))`
#>                      <model>
#> 1              <ARIMA(0,1,0)>

In this case, calculations of the ACF will find the longest contiguous part which is too small to be meaningful, so there isn't any point showing the results of ACF() or gg_tsdisplay() .在这种情况下,ACF 的计算将找到最长的连续部分,该部分太小而无意义,因此没有任何点显示ACF()gg_tsdisplay()的结果。 Also, the automated choice of differencing in the ARIMA model fails due to the missing values, so I have manually set it to one.此外,由于缺少值,ARIMA 模型中差分的自动选择失败,因此我手动将其设置为 1。 The other parts of the ARIMA model work fine in the presence of missing values. ARIMA 模型的其他部分在存在缺失值的情况下工作正常。

2. Fill non-trading days with the last observed values 2. 用最后观察到的值填充非交易日

stock <- stock %>%
  tidyr::fill(Close, .direction = "down")
stock
#> # A tsibble: 1,825 x 8 [1D]
#>    Symbol Date        Open  High   Low Close Adj_Close    Volume
#>    <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>     <dbl>
#>  1 AAPL   2014-01-02  79.4  79.6  78.9  79.0      67.0  58671200
#>  2 AAPL   2014-01-03  79.0  79.1  77.2  77.3      65.5  98116900
#>  3 <NA>   2014-01-04  NA    NA    NA    77.3      NA          NA
#>  4 <NA>   2014-01-05  NA    NA    NA    77.3      NA          NA
#>  5 AAPL   2014-01-06  76.8  78.1  76.2  77.7      65.9 103152700
#>  6 AAPL   2014-01-07  77.8  78.0  76.8  77.1      65.4  79302300
#>  7 AAPL   2014-01-08  77.0  77.9  77.0  77.6      65.8  64632400
#>  8 AAPL   2014-01-09  78.1  78.1  76.5  76.6      65.0  69787200
#>  9 AAPL   2014-01-10  77.1  77.3  75.9  76.1      64.5  76244000
#> 10 <NA>   2014-01-11  NA    NA    NA    76.1      NA          NA
#> # … with 1,815 more rows

stock %>%
  ACF(difference(Close)) %>%
  autoplot()

stock %>%
  model(ARIMA(Close))
#> # A mable: 1 x 1
#>   `ARIMA(Close)`
#>          <model>
#> 1 <ARIMA(0,1,0)>

stock %>%
  gg_tsdisplay(Close)

3. Re-index by trading day 3. 按交易日重新指数

stock <- gafa_stock %>%
  filter(Symbol == "AAPL") %>%
  tsibble(index = Date, regular = TRUE) %>%
  mutate(trading_day = row_number()) %>%
  tsibble(index = trading_day)
stock
#> # A tsibble: 1,258 x 9 [1]
#>    Symbol Date        Open  High   Low Close Adj_Close    Volume trading_day
#>    <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>     <dbl>       <int>
#>  1 AAPL   2014-01-02  79.4  79.6  78.9  79.0      67.0  58671200           1
#>  2 AAPL   2014-01-03  79.0  79.1  77.2  77.3      65.5  98116900           2
#>  3 AAPL   2014-01-06  76.8  78.1  76.2  77.7      65.9 103152700           3
#>  4 AAPL   2014-01-07  77.8  78.0  76.8  77.1      65.4  79302300           4
#>  5 AAPL   2014-01-08  77.0  77.9  77.0  77.6      65.8  64632400           5
#>  6 AAPL   2014-01-09  78.1  78.1  76.5  76.6      65.0  69787200           6
#>  7 AAPL   2014-01-10  77.1  77.3  75.9  76.1      64.5  76244000           7
#>  8 AAPL   2014-01-13  75.7  77.5  75.7  76.5      64.9  94623200           8
#>  9 AAPL   2014-01-14  76.9  78.1  76.8  78.1      66.1  83140400           9
#> 10 AAPL   2014-01-15  79.1  80.0  78.8  79.6      67.5  97909700          10
#> # … with 1,248 more rows

stock %>%
  ACF(difference(Close)) %>%
  autoplot()

stock %>%
  model(ARIMA(Close))
#> # A mable: 1 x 1
#>   `ARIMA(Close)`
#>          <model>
#> 1 <ARIMA(2,1,3)>

stock %>%
  gg_tsdisplay(Close)

Created on 2022-05-22 by the reprex package (v2.0.1)reprex 包于 2022-05-22 创建 (v2.0.1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM