简体   繁体   English

如何从 R 中的 forecast::auto.arima 获取顶级 x 模型的所有详细信息?

[英]How can I get all details of the top x models from forecast::auto.arima in R?

My goal is to exactly re-create the top say 3 models from the auto.arima function in R. My example uses the following series:我的目标是从 R 中的auto.arima函数准确地重新创建 top say 3 模型。我的示例使用以下系列:

> data <- c(79, 73, 102, 158, 235, 326, 216, 153, 82, 87, 94, 163, 119, 81, 143, 
          179, 182, 247, 248, 105, 143, 64, 45, 163, 122, 95, 47, 117, 345, 454,
          596, 440, 207, 94, 100, 187, 106, 98, 66, 81, 178, 196, 279, 128, 80,
          178, 112, 175, 91, 52, 35, 14, 20, 332, 386, 121, 167, 53, 94, 224,
          76, 68, 42, 61, 125, 273, 431, 1901, 86, 65, 120, 227, 131, 84, 48, 153, 157)

> time_series <- ts(data, freq=12, start=c(2013, 10))

I use auto.arima function shown below to capture the model information for the best model and see in text the top models.我使用下面显示的auto.arima函数来捕获最佳模型的模型信息,并在文本中查看顶级模型。

> library(forecast)
> auto_mod <- auto.arima(time_series, trace=TRUE)

 ARIMA(2,0,2)(1,0,1)[12] with non-zero mean : 1067.045
 ARIMA(0,0,0)            with non-zero mean : 1057.566
 ARIMA(1,0,0)(1,0,0)[12] with non-zero mean : 1057.504
 ARIMA(0,0,1)(0,0,1)[12] with non-zero mean : 1057.727
 ARIMA(0,0,0)            with zero mean     : 1092.138
 ARIMA(1,0,0)            with non-zero mean : 1055.976
 ARIMA(1,0,0)(0,0,1)[12] with non-zero mean : 1057.629
 ARIMA(1,0,0)(1,0,1)[12] with non-zero mean : 1058.261
 ARIMA(2,0,0)            with non-zero mean : 1058.174
 ARIMA(1,0,1)            with non-zero mean : 1058.187
 ARIMA(0,0,1)            with non-zero mean : 1056.126
 ARIMA(2,0,1)            with non-zero mean : Inf
 ARIMA(1,0,0)            with zero mean     : 1070.847

 Best model: ARIMA(1,0,0)            with non-zero mean 

> summary(auto_mod)
Series: time_series 
ARIMA(1,0,0) with non-zero mean 

Coefficients:
         ar1      mean
      0.2170  176.2688
s.e.  0.1105   32.0040

sigma^2 estimated as 49991:  log likelihood=-524.82
AIC=1055.65   AICc=1055.98   BIC=1062.68

Training set error measures:
                    ME    RMSE      MAE       MPE     MAPE     MASE
Training set 0.3042569 220.664 104.5716 -76.32587 96.76548 1.051865
                    ACF1
Training set 0.006605514

I thought I could use the text output to exactly re-create the models other than the top 1 but I have realized that the description of the models in the text output comes close, but does not seem to exactly, identify each model uniquely.我以为我可以使用文本输出来准确地重新创建除前 1 个模型之外的模型,但我意识到文本输出中模型的描述接近,但似乎并没有完全准确地识别每个模型。 I think it gives information about if there is an intercept or if there is drift, but not both.我认为它提供了有关是否有拦截或是否有漂移的信息,但不是两者都有。 I've found one such example below where two different models, one with an intercept and one without, have the same label.我在下面找到了一个这样的例子,其中两个不同的模型,一个有截距,一个没有,具有相同的标签。

> mod1 <- Arima(time_series, order=c(1,0,0), seasonal=c(1,0,0), include.drift=TRUE,  
+                    include.mean=TRUE)
> summary(mod1)
Series: time_series 
ARIMA(1,0,0)(1,0,0)[12] with drift 

Coefficients:
         ar1    sar1  intercept   drift
      0.1722  0.1976   132.4150  1.2188
s.e.  0.1175  0.2059    68.9234  1.5233

sigma^2 estimated as 50174:  log likelihood=-524.15
AIC=1058.31   AICc=1059.15   BIC=1070.03

Training set error measures:
                      ME    RMSE      MAE       MPE     MAPE     MASE
Training set -0.05272209 218.099 99.62455 -76.12583 94.97947 1.002104
                    ACF1
Training set 0.004756765
> mod2 <- Arima(time_series, order=c(1,0,0), seasonal=c(1,0,0), include.drift=TRUE,  
+                    include.mean=FALSE)
> summary(mod2)
Series: time_series 
ARIMA(1,0,0)(1,0,0)[12] with drift 

Coefficients:
         ar1    sar1   drift
      0.2014  0.2862  3.6940
s.e.  0.1205  0.1960  0.9009

sigma^2 estimated as 51238:  log likelihood=-525.77
AIC=1059.53   AICc=1060.09   BIC=1068.91

Training set error measures:
                   ME     RMSE      MAE       MPE     MAPE     MASE
Training set 19.14777 221.9052 107.1779 -57.65942 96.53528 1.078082
                    ACF1
Training set -0.01092874

So in summary my question is: Is there a way to use an auto-fitting process in R to get the top x arima models?所以总而言之,我的问题是:有没有办法在 R 中使用自动拟合过程来获得顶级 x arima 模型? I'm open to any solution but my order of preference for a solution would be:我对任何解决方案都持开放态度,但我对解决方案的偏好顺序是:

  • A way to pull model objects for top x models out of auto.arima一种从auto.arima顶级 x 模型的模型对象的auto.arima
  • A way to pull the information on top x models in text or understand the trace argument better in order to back into creating the models myself.一种以文本形式提取顶部 x 模型信息或更好地理解跟踪参数以便自己重新创建模型的方法。
  • Some other package or functionality其他一些包或功能

Thanks in advance for any help that can be offered!在此先感谢您提供的任何帮助!

The information from the forecast::auto.arima trace uniquely determines the model.来自forecast::auto.arima跟踪的信息唯一地确定了模型。

Let's explicitly verify this for the third model (using your sample data):让我们为第三个模型明确验证这一点(使用您的示例数据):

ARIMA(1,0,0)(1,0,0)[12] with non-zero mean : 1057.504

We use base R's arima to manually fit the same model我们使用 base R 的arima来手动拟合相同的模型

fit3 <- arima(
    time_series, 
    order = c(1, 0, 0), 
    seasonal = list(order = c(1, 0, 0), period = 12L),
    include.mean = TRUE)

When using ML to estimate model parameters, arima returns the AIC value;使用ML估计模型参数时, arima返回AIC值; the trace of forecast::auto.arima on the other hand returns the corrected (for small sample size) AIC.另一方面, forecast::auto.arima的跟踪返回校正后的(对于小样本量)AIC。 Thanks to this post on Cross Validated , we can write a custom function that calculates the AICc based on the model parameters.感谢这篇关于 Cross Validated 的帖子,我们可以编写一个自定义函数,根据模型参数计算 AICc。

get_AICc <- function(fit) {
    npar <- length(fit$coef) + 1
    nstar <- length(fit$residuals) - fit$arma[6] - fit$arma[7] * fit$arma[5]
    fit$aic + 2 * npar * (nstar/(nstar - npar - 1) - 1)
}

We confirm that the corrected AIC is the same as the one reported in the trace of forecast::auto.arima :我们确认更正后的 AIC 与forecast::auto.arima跟踪中报告的相同:

get_AICc(fit3)
#[1] 1057.504

forecast::Arima automatically returns both AIC and AICc values, and fit results coincide with those from arima (unsurprisingly, as forecast::Arima internally calls arima ) and forecast::auto.arima : arima forecast::Arima自动返回 AIC 和 AICc 值,拟合结果与来自arima结果一致( arimaforecast::Arima内部调用arima )和forecast::auto.arima


Arima(
    time_series,
    order = c(1, 0, 0), 
    seasonal = c(1, 0, 0),
    include.mean = TRUE)
)
#Series: time_series 
#ARIMA(1,0,0)(1,0,0)[12] with non-zero mean 
#
#Coefficients:
#    ar1    sar1      mean
#0.1849  0.1780  179.5156
#s.e.  0.1168  0.2077   36.2321
#
#sigma^2 estimated as 49966:  log likelihood=-524.47
#AIC=1056.95   AICc=1057.5   BIC=1066.32

In comparison to base R's arima , forecast::Arima allows for more flexibility on the inclusion of constants: while arima allows for a constant offset through include.mean , forecast::Arima also allows for a time-dependent (deterministic) drift parameter.与基础 R 的arimaforecast::Arima允许在包含常量方面具有更大的灵活性:而arima允许通过include.mean进行常量偏移,而forecast::Arima还允许时间相关(确定性)漂移参数。 For more details, see Constants and ARIMA models in R .有关更多详细信息,请参阅R 中的常量和 ARIMA 模型


I'm not aware of an already-available way to obtain a list of top n model fits from forecast::auto.arima .我不知道有一种已经可用的方法可以从forecast::auto.arima获取前 n 个模型拟合列表。 The trace only prints intermediate results from the search in parameter space to the console.跟踪只将在参数空间中搜索的中间结果打印到控制台。 In the past, I have defined & written my own parameter grid search of SARIMA model parameters to determine an optimal fit model based on one of the information criteria;过去,我已经定义并编写了自己的 SARIMA 模型参数的参数网格搜索,以根据信息标准之一确定最佳拟合模型; however, forecast::auto.arima is more sophisticated and also considers (ie excludes) model fits with unit roots close to the unit circle.然而, forecast::auto.arima更复杂,并且还考虑(即排除)模型拟合与单位圆接近的单位根。 Your best bet is to either define your own grid search and have it return a list of the top n models;最好的办法是定义自己的网格搜索并让它返回前 n 个模型的列表; or modify the code of forecast::auto.arima to return such a list;或者修改forecast::auto.arima的代码,返回这样的列表; or (probably the least elegant solution) capture and parse the output of auto.arima(... trace = TRUE) to extract model parameters and re-fit using arima or forecast::Arima .或(可能是最不优雅的解决方案)捕获并解析auto.arima(... trace = TRUE)的输出以提取模型参数并使用arimaforecast::Arima重新拟合。


Update 1更新 1

I did say that capturing the output of auto.arima was probably the least elegant method, but this is what I did end up doing;我确实说过捕获auto.arima的输出可能是最不优雅的方法,但这就是我最终所做的; perhaps this is helpful to you as a starting point:也许这对您有所帮助作为起点:

Let's define a function that parses an ARIMA model string as generated from auto.arima让我们定义一个函数来解析从auto.arima生成的 ARIMA 模型字符串

parse_SARIMA_string <- function(x) {
    include.mean <- str_detect(x, "with non-zero mean")
    x %>% 
        str_remove_all(":.+$") %>%
        str_replace_all("\\D", " ") %>%
        trimws() %>%
        enframe() %>%
        separate(
            value, 
            c("p", "d", "q", "P", "D", "Q", "period"),
            sep = "\\s+",
            fill = "right") %>%
        mutate(include.mean = include.mean)
}

Then we can use capture.output to capture the trace of forecast::auto.arima and parse the model strings然后我们可以使用capture.output来捕获forecast::auto.arima并解析模型字符串

library(tidyverse)
capture.output(auto.arima(time_series, trace = TRUE)) %>%
    str_subset("^ ARIMA") %>%
    parse_SARIMA_string()
## A tibble: 13 x 9
#    name p     d     q     P     D     Q     period include.mean
#   <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr>  <lgl>       
# 1     1 2     0     2     1     0     1     12     TRUE        
# 2     2 0     0     0     NA    NA    NA    NA     TRUE        
# 3     3 1     0     0     1     0     0     12     TRUE        
# 4     4 0     0     1     0     0     1     12     TRUE        
# 5     5 0     0     0     NA    NA    NA    NA     FALSE       
# 6     6 1     0     0     NA    NA    NA    NA     TRUE        
# 7     7 1     0     0     0     0     1     12     TRUE        
# 8     8 1     0     0     1     0     1     12     TRUE        
# 9     9 2     0     0     NA    NA    NA    NA     TRUE        
#10    10 1     0     1     NA    NA    NA    NA     TRUE        
#11    11 0     0     1     NA    NA    NA    NA     TRUE        
#12    12 2     0     1     NA    NA    NA    NA     TRUE        
#13    13 1     0     0     NA    NA    NA    NA     FALSE 

Now that you have a data.frame / tibble with the model parameters it's straightforward to use them in arima or forecast::Arima .现在,你有一个data.frame / tibble与模型参数,它是简单地运用它们arimaforecast::Arima


Update 2更新 2

Regarding the identifiability of a SARIMA model based on the auto.arima model string, let's take a look at some specific models.关于基于auto.arima模型字符串的 SARIMA 模型的可识别性,我们来看看一些具体的模型。

With differencing有差分

  • Model 1 is an ARMA model with a constant模型 1 是具有常数的 ARMA 模型

    在此处输入图片说明

  • Model 2 is an ARIMA model with drift模型 2 是带有漂移的 ARIMA 模型

    在此处输入图片说明

    We can expand the model to get我们可以展开模型得到

    在此处输入图片说明

  • Model 3 is an ARIMA model with drift and a constant模型 3 是具有漂移和常数的 ARIMA 模型

    在此处输入图片说明

    When we expand model 3 we notice that the constant term cancels due to the differencing, and we end up with the same model 2.当我们扩展模型 3 时,我们注意到常数项由于差分而取消,我们最终得到相同的模型 2。

So d=1 models with drift and drift+constant describe the same process.所以带有漂移和漂移+常数的d=1模型描述了相同的过程。 In this case, reporting a model as eg "ARIMA(1,1,0) with drift" would refer to the same model as saying "ARIMA(1,1,0) with drift and non-zero mean".在这种情况下,将模型报告为例如“具有漂移的 ARIMA(1,1,0)”将指代与说“具有漂移和非零均值的 ARIMA(1,1,0)”相同的模型。

No differencing无差异

For d=0 , using include.drift = TRUE by default results in a drift and constant offset parameter, see eg Constants and ARIMA models in R .对于d=0默认情况下使用include.drift = TRUE导致漂移和常量偏移参数,请参见R 中的常量和 ARIMA 模型 If you specify include.drift = TRUE and then manually force include.mean = FALSE you get a different model (as you show in mod1 and mod2 ).如果您指定include.drift = TRUE然后手动强制include.mean = FALSE您会得到一个不同的模型(如mod1mod2 )。 From what I understand, this will never happen in forecast::auto.arima , where include.drift = TRUE implicitly always has include.mean = TRUE .据我了解,这永远不会发生在forecast::auto.arima ,其中include.drift = TRUE隐式总是包含include.mean = TRUE So a SARIMA string ARIMA(1,0,0)(1,0,0)[12] with drift from forecast::auto.arima would translate to a SARIMA(1,0,0)(1,0,0)[12] model with drift and offset.因此, ARIMA(1,0,0)(1,0,0)[12] with driftforecast::auto.arima ARIMA(1,0,0)(1,0,0)[12] with drift的SARIMA 字符串ARIMA(1,0,0)(1,0,0)[12] with drift将转换为SARIMA(1,0,0)(1,0,0) [12] 带有漂移偏移的模型。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM