[英]Forecasting irregular stock data with ARIMA and tsibble
我想以與 R. Hyndman 在FPP3中類似的方式使用 ARIMA 預測某只股票。
我遇到的第一個問題是股票數據明顯不規則,因為證券交易所在周末和一些節假日休市。 如果我想使用 tidyverts 包中的函數,這會產生一些問題:
> stock
# A tsibble: 750 x 6 [1D]
Date Open High Low Close Volume
<date> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2019-05-21 36.3 36.4 36.3 36.4 232
2 2019-05-22 36.4 37.0 36.4 36.8 1007
3 2019-05-23 36.7 36.8 36.1 36.1 4298
4 2019-05-24 36.4 36.5 36.4 36.4 452
5 2019-05-27 36.5 36.5 36.3 36.4 2032
6 2019-05-28 36.5 36.8 36.4 36.5 3049
7 2019-05-29 36.2 36.5 36.1 36.5 2962
8 2019-05-30 36.8 37.1 36.8 37.1 432
9 2019-05-31 36.8 37.4 36.8 37.4 8424
10 2019-06-03 37.3 37.5 37.2 37.3 1550
# ... with 740 more rows
> stock %>%
+ feasts::ACF(difference(Close)) %>%
+ autoplot()
Error in `check_gaps()`:
! .data contains implicit gaps in time. You should check your data and convert implicit gaps into explicit missing values using `tsibble::fill_gaps()` if required.
關於時間間隔的相同錯誤適用於其他函數,如 fable::ARIMA() 或 feasts::gg_tsdisplay()。
我嘗試用前幾行的值填補空白:
stock %>%
group_by_key() %>%
fill_gaps() %>%
tidyr::fill(Close, .direction = "down")
# A tsibble: 1,096 x 6 [1D]
Date Open High Low Close Volume
<date> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2019-05-21 36.3 36.4 36.3 36.4 232
2 2019-05-22 36.4 37.0 36.4 36.8 1007
3 2019-05-23 36.7 36.8 36.1 36.1 4298
4 2019-05-24 36.4 36.5 36.4 36.4 452
5 2019-05-25 NA NA NA 36.4 NA
6 2019-05-26 NA NA NA 36.4 NA
7 2019-05-27 36.5 36.5 36.3 36.4 2032
8 2019-05-28 36.5 36.8 36.4 36.5 3049
9 2019-05-29 36.2 36.5 36.1 36.5 2962
10 2019-05-30 36.8 37.1 36.8 37.1 432
# ... with 1,086 more rows
從那里開始一切正常。 我的問題是:
首先,您顯然使用的是舊版本的 feasts 包,因為當前版本在從具有隱式間隙的數據計算 ACF 時會發出警告而不是錯誤。
其次,答案取決於您要進行的分析。 你有三個選擇:
以下是他們每個人的結果,以 2014-2018 年期間的蘋果股票為例。
library(fpp3)
#> ── Attaching packages ─────────────────────────────────────── fpp3 0.4.0.9000 ──
#> ✔ tibble 3.1.7 ✔ tsibble 1.1.1
#> ✔ dplyr 1.0.9 ✔ tsibbledata 0.4.0
#> ✔ tidyr 1.2.0 ✔ feasts 0.2.2
#> ✔ lubridate 1.8.0 ✔ fable 0.3.1
#> ✔ ggplot2 3.3.6 ✔ fabletools 0.3.2
#> ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
#> ✖ lubridate::date() masks base::date()
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ tsibble::intersect() masks base::intersect()
#> ✖ tsibble::interval() masks lubridate::interval()
#> ✖ dplyr::lag() masks stats::lag()
#> ✖ tsibble::setdiff() masks base::setdiff()
#> ✖ tsibble::union() masks base::union()
stock <- gafa_stock %>%
filter(Symbol == "AAPL") %>%
tsibble(index = Date, regular = TRUE) %>%
fill_gaps()
stock
#> # A tsibble: 1,825 x 8 [1D]
#> Symbol Date Open High Low Close Adj_Close Volume
#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 AAPL 2014-01-02 79.4 79.6 78.9 79.0 67.0 58671200
#> 2 AAPL 2014-01-03 79.0 79.1 77.2 77.3 65.5 98116900
#> 3 <NA> 2014-01-04 NA NA NA NA NA NA
#> 4 <NA> 2014-01-05 NA NA NA NA NA NA
#> 5 AAPL 2014-01-06 76.8 78.1 76.2 77.7 65.9 103152700
#> 6 AAPL 2014-01-07 77.8 78.0 76.8 77.1 65.4 79302300
#> 7 AAPL 2014-01-08 77.0 77.9 77.0 77.6 65.8 64632400
#> 8 AAPL 2014-01-09 78.1 78.1 76.5 76.6 65.0 69787200
#> 9 AAPL 2014-01-10 77.1 77.3 75.9 76.1 64.5 76244000
#> 10 <NA> 2014-01-11 NA NA NA NA NA NA
#> # … with 1,815 more rows
stock %>%
model(ARIMA(Close ~ pdq(d=1)))
#> A mable: 1 x 1
#> `ARIMA(Close ~ pdq(d = 1))`
#> <model>
#> 1 <ARIMA(0,1,0)>
在這種情況下,ACF 的計算將找到最長的連續部分,該部分太小而無意義,因此沒有任何點顯示ACF()
或gg_tsdisplay()
的結果。 此外,由於缺少值,ARIMA 模型中差分的自動選擇失敗,因此我手動將其設置為 1。 ARIMA 模型的其他部分在存在缺失值的情況下工作正常。
stock <- stock %>%
tidyr::fill(Close, .direction = "down")
stock
#> # A tsibble: 1,825 x 8 [1D]
#> Symbol Date Open High Low Close Adj_Close Volume
#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 AAPL 2014-01-02 79.4 79.6 78.9 79.0 67.0 58671200
#> 2 AAPL 2014-01-03 79.0 79.1 77.2 77.3 65.5 98116900
#> 3 <NA> 2014-01-04 NA NA NA 77.3 NA NA
#> 4 <NA> 2014-01-05 NA NA NA 77.3 NA NA
#> 5 AAPL 2014-01-06 76.8 78.1 76.2 77.7 65.9 103152700
#> 6 AAPL 2014-01-07 77.8 78.0 76.8 77.1 65.4 79302300
#> 7 AAPL 2014-01-08 77.0 77.9 77.0 77.6 65.8 64632400
#> 8 AAPL 2014-01-09 78.1 78.1 76.5 76.6 65.0 69787200
#> 9 AAPL 2014-01-10 77.1 77.3 75.9 76.1 64.5 76244000
#> 10 <NA> 2014-01-11 NA NA NA 76.1 NA NA
#> # … with 1,815 more rows
stock %>%
ACF(difference(Close)) %>%
autoplot()
stock %>%
model(ARIMA(Close))
#> # A mable: 1 x 1
#> `ARIMA(Close)`
#> <model>
#> 1 <ARIMA(0,1,0)>
stock %>%
gg_tsdisplay(Close)
stock <- gafa_stock %>%
filter(Symbol == "AAPL") %>%
tsibble(index = Date, regular = TRUE) %>%
mutate(trading_day = row_number()) %>%
tsibble(index = trading_day)
stock
#> # A tsibble: 1,258 x 9 [1]
#> Symbol Date Open High Low Close Adj_Close Volume trading_day
#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 AAPL 2014-01-02 79.4 79.6 78.9 79.0 67.0 58671200 1
#> 2 AAPL 2014-01-03 79.0 79.1 77.2 77.3 65.5 98116900 2
#> 3 AAPL 2014-01-06 76.8 78.1 76.2 77.7 65.9 103152700 3
#> 4 AAPL 2014-01-07 77.8 78.0 76.8 77.1 65.4 79302300 4
#> 5 AAPL 2014-01-08 77.0 77.9 77.0 77.6 65.8 64632400 5
#> 6 AAPL 2014-01-09 78.1 78.1 76.5 76.6 65.0 69787200 6
#> 7 AAPL 2014-01-10 77.1 77.3 75.9 76.1 64.5 76244000 7
#> 8 AAPL 2014-01-13 75.7 77.5 75.7 76.5 64.9 94623200 8
#> 9 AAPL 2014-01-14 76.9 78.1 76.8 78.1 66.1 83140400 9
#> 10 AAPL 2014-01-15 79.1 80.0 78.8 79.6 67.5 97909700 10
#> # … with 1,248 more rows
stock %>%
ACF(difference(Close)) %>%
autoplot()
stock %>%
model(ARIMA(Close))
#> # A mable: 1 x 1
#> `ARIMA(Close)`
#> <model>
#> 1 <ARIMA(2,1,3)>
stock %>%
gg_tsdisplay(Close)
由reprex 包於 2022-05-22 創建 (v2.0.1)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.