[英]How can I get all details of the top x models from forecast::auto.arima in R?
My goal is to exactly re-create the top say 3 models from the auto.arima
function in R. My example uses the following series:我的目标是从 R 中的auto.arima
函数准确地重新创建 top say 3 模型。我的示例使用以下系列:
> data <- c(79, 73, 102, 158, 235, 326, 216, 153, 82, 87, 94, 163, 119, 81, 143,
179, 182, 247, 248, 105, 143, 64, 45, 163, 122, 95, 47, 117, 345, 454,
596, 440, 207, 94, 100, 187, 106, 98, 66, 81, 178, 196, 279, 128, 80,
178, 112, 175, 91, 52, 35, 14, 20, 332, 386, 121, 167, 53, 94, 224,
76, 68, 42, 61, 125, 273, 431, 1901, 86, 65, 120, 227, 131, 84, 48, 153, 157)
> time_series <- ts(data, freq=12, start=c(2013, 10))
I use auto.arima
function shown below to capture the model information for the best model and see in text the top models.我使用下面显示的auto.arima
函数来捕获最佳模型的模型信息,并在文本中查看顶级模型。
> library(forecast)
> auto_mod <- auto.arima(time_series, trace=TRUE)
ARIMA(2,0,2)(1,0,1)[12] with non-zero mean : 1067.045
ARIMA(0,0,0) with non-zero mean : 1057.566
ARIMA(1,0,0)(1,0,0)[12] with non-zero mean : 1057.504
ARIMA(0,0,1)(0,0,1)[12] with non-zero mean : 1057.727
ARIMA(0,0,0) with zero mean : 1092.138
ARIMA(1,0,0) with non-zero mean : 1055.976
ARIMA(1,0,0)(0,0,1)[12] with non-zero mean : 1057.629
ARIMA(1,0,0)(1,0,1)[12] with non-zero mean : 1058.261
ARIMA(2,0,0) with non-zero mean : 1058.174
ARIMA(1,0,1) with non-zero mean : 1058.187
ARIMA(0,0,1) with non-zero mean : 1056.126
ARIMA(2,0,1) with non-zero mean : Inf
ARIMA(1,0,0) with zero mean : 1070.847
Best model: ARIMA(1,0,0) with non-zero mean
> summary(auto_mod)
Series: time_series
ARIMA(1,0,0) with non-zero mean
Coefficients:
ar1 mean
0.2170 176.2688
s.e. 0.1105 32.0040
sigma^2 estimated as 49991: log likelihood=-524.82
AIC=1055.65 AICc=1055.98 BIC=1062.68
Training set error measures:
ME RMSE MAE MPE MAPE MASE
Training set 0.3042569 220.664 104.5716 -76.32587 96.76548 1.051865
ACF1
Training set 0.006605514
I thought I could use the text output to exactly re-create the models other than the top 1 but I have realized that the description of the models in the text output comes close, but does not seem to exactly, identify each model uniquely.我以为我可以使用文本输出来准确地重新创建除前 1 个模型之外的模型,但我意识到文本输出中模型的描述接近,但似乎并没有完全准确地识别每个模型。 I think it gives information about if there is an intercept or if there is drift, but not both.我认为它提供了有关是否有拦截或是否有漂移的信息,但不是两者都有。 I've found one such example below where two different models, one with an intercept and one without, have the same label.我在下面找到了一个这样的例子,其中两个不同的模型,一个有截距,一个没有,具有相同的标签。
> mod1 <- Arima(time_series, order=c(1,0,0), seasonal=c(1,0,0), include.drift=TRUE,
+ include.mean=TRUE)
> summary(mod1)
Series: time_series
ARIMA(1,0,0)(1,0,0)[12] with drift
Coefficients:
ar1 sar1 intercept drift
0.1722 0.1976 132.4150 1.2188
s.e. 0.1175 0.2059 68.9234 1.5233
sigma^2 estimated as 50174: log likelihood=-524.15
AIC=1058.31 AICc=1059.15 BIC=1070.03
Training set error measures:
ME RMSE MAE MPE MAPE MASE
Training set -0.05272209 218.099 99.62455 -76.12583 94.97947 1.002104
ACF1
Training set 0.004756765
> mod2 <- Arima(time_series, order=c(1,0,0), seasonal=c(1,0,0), include.drift=TRUE,
+ include.mean=FALSE)
> summary(mod2)
Series: time_series
ARIMA(1,0,0)(1,0,0)[12] with drift
Coefficients:
ar1 sar1 drift
0.2014 0.2862 3.6940
s.e. 0.1205 0.1960 0.9009
sigma^2 estimated as 51238: log likelihood=-525.77
AIC=1059.53 AICc=1060.09 BIC=1068.91
Training set error measures:
ME RMSE MAE MPE MAPE MASE
Training set 19.14777 221.9052 107.1779 -57.65942 96.53528 1.078082
ACF1
Training set -0.01092874
So in summary my question is: Is there a way to use an auto-fitting process in R to get the top x arima models?所以总而言之,我的问题是:有没有办法在 R 中使用自动拟合过程来获得顶级 x arima 模型? I'm open to any solution but my order of preference for a solution would be:我对任何解决方案都持开放态度,但我对解决方案的偏好顺序是:
auto.arima
一种从auto.arima
顶级 x 模型的模型对象的auto.arima
Thanks in advance for any help that can be offered!在此先感谢您提供的任何帮助!
The information from the forecast::auto.arima
trace uniquely determines the model.来自forecast::auto.arima
跟踪的信息唯一地确定了模型。
Let's explicitly verify this for the third model (using your sample data):让我们为第三个模型明确验证这一点(使用您的示例数据):
ARIMA(1,0,0)(1,0,0)[12] with non-zero mean : 1057.504
We use base R's arima
to manually fit the same model我们使用 base R 的arima
来手动拟合相同的模型
fit3 <- arima(
time_series,
order = c(1, 0, 0),
seasonal = list(order = c(1, 0, 0), period = 12L),
include.mean = TRUE)
When using ML to estimate model parameters, arima
returns the AIC value;使用ML估计模型参数时, arima
返回AIC值; the trace of forecast::auto.arima
on the other hand returns the corrected (for small sample size) AIC.另一方面, forecast::auto.arima
的跟踪返回校正后的(对于小样本量)AIC。 Thanks to this post on Cross Validated , we can write a custom function that calculates the AICc based on the model parameters.感谢这篇关于 Cross Validated 的帖子,我们可以编写一个自定义函数,根据模型参数计算 AICc。
get_AICc <- function(fit) {
npar <- length(fit$coef) + 1
nstar <- length(fit$residuals) - fit$arma[6] - fit$arma[7] * fit$arma[5]
fit$aic + 2 * npar * (nstar/(nstar - npar - 1) - 1)
}
We confirm that the corrected AIC is the same as the one reported in the trace of forecast::auto.arima
:我们确认更正后的 AIC 与forecast::auto.arima
跟踪中报告的相同:
get_AICc(fit3)
#[1] 1057.504
forecast::Arima
automatically returns both AIC and AICc values, and fit results coincide with those from arima
(unsurprisingly, as forecast::Arima
internally calls arima
) and forecast::auto.arima
: arima
forecast::Arima
自动返回 AIC 和 AICc 值,拟合结果与来自arima
结果一致( arima
, forecast::Arima
内部调用arima
)和forecast::auto.arima
:
Arima(
time_series,
order = c(1, 0, 0),
seasonal = c(1, 0, 0),
include.mean = TRUE)
)
#Series: time_series
#ARIMA(1,0,0)(1,0,0)[12] with non-zero mean
#
#Coefficients:
# ar1 sar1 mean
#0.1849 0.1780 179.5156
#s.e. 0.1168 0.2077 36.2321
#
#sigma^2 estimated as 49966: log likelihood=-524.47
#AIC=1056.95 AICc=1057.5 BIC=1066.32
In comparison to base R's arima
, forecast::Arima
allows for more flexibility on the inclusion of constants: while arima
allows for a constant offset through include.mean
, forecast::Arima
also allows for a time-dependent (deterministic) drift parameter.与基础 R 的arima
, forecast::Arima
允许在包含常量方面具有更大的灵活性:而arima
允许通过include.mean
进行常量偏移,而forecast::Arima
还允许时间相关(确定性)漂移参数。 For more details, see Constants and ARIMA models in R .有关更多详细信息,请参阅R 中的常量和 ARIMA 模型。
I'm not aware of an already-available way to obtain a list of top n model fits from forecast::auto.arima
.我不知道有一种已经可用的方法可以从forecast::auto.arima
获取前 n 个模型拟合列表。 The trace only prints intermediate results from the search in parameter space to the console.跟踪只将在参数空间中搜索的中间结果打印到控制台。 In the past, I have defined & written my own parameter grid search of SARIMA model parameters to determine an optimal fit model based on one of the information criteria;过去,我已经定义并编写了自己的 SARIMA 模型参数的参数网格搜索,以根据信息标准之一确定最佳拟合模型; however, forecast::auto.arima
is more sophisticated and also considers (ie excludes) model fits with unit roots close to the unit circle.然而, forecast::auto.arima
更复杂,并且还考虑(即排除)模型拟合与单位圆接近的单位根。 Your best bet is to either define your own grid search and have it return a list of the top n models;最好的办法是定义自己的网格搜索并让它返回前 n 个模型的列表; or modify the code of forecast::auto.arima
to return such a list;或者修改forecast::auto.arima
的代码,返回这样的列表; or (probably the least elegant solution) capture and parse the output of auto.arima(... trace = TRUE)
to extract model parameters and re-fit using arima
or forecast::Arima
.或(可能是最不优雅的解决方案)捕获并解析auto.arima(... trace = TRUE)
的输出以提取模型参数并使用arima
或forecast::Arima
重新拟合。
I did say that capturing the output of auto.arima
was probably the least elegant method, but this is what I did end up doing;我确实说过捕获auto.arima
的输出可能是最不优雅的方法,但这就是我最终所做的; perhaps this is helpful to you as a starting point:也许这对您有所帮助作为起点:
Let's define a function that parses an ARIMA model string as generated from auto.arima
让我们定义一个函数来解析从auto.arima
生成的 ARIMA 模型字符串
parse_SARIMA_string <- function(x) {
include.mean <- str_detect(x, "with non-zero mean")
x %>%
str_remove_all(":.+$") %>%
str_replace_all("\\D", " ") %>%
trimws() %>%
enframe() %>%
separate(
value,
c("p", "d", "q", "P", "D", "Q", "period"),
sep = "\\s+",
fill = "right") %>%
mutate(include.mean = include.mean)
}
Then we can use capture.output
to capture the trace of forecast::auto.arima
and parse the model strings然后我们可以使用capture.output
来捕获forecast::auto.arima
并解析模型字符串
library(tidyverse)
capture.output(auto.arima(time_series, trace = TRUE)) %>%
str_subset("^ ARIMA") %>%
parse_SARIMA_string()
## A tibble: 13 x 9
# name p d q P D Q period include.mean
# <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <lgl>
# 1 1 2 0 2 1 0 1 12 TRUE
# 2 2 0 0 0 NA NA NA NA TRUE
# 3 3 1 0 0 1 0 0 12 TRUE
# 4 4 0 0 1 0 0 1 12 TRUE
# 5 5 0 0 0 NA NA NA NA FALSE
# 6 6 1 0 0 NA NA NA NA TRUE
# 7 7 1 0 0 0 0 1 12 TRUE
# 8 8 1 0 0 1 0 1 12 TRUE
# 9 9 2 0 0 NA NA NA NA TRUE
#10 10 1 0 1 NA NA NA NA TRUE
#11 11 0 0 1 NA NA NA NA TRUE
#12 12 2 0 1 NA NA NA NA TRUE
#13 13 1 0 0 NA NA NA NA FALSE
Now that you have a data.frame
/ tibble
with the model parameters it's straightforward to use them in arima
or forecast::Arima
.现在,你有一个data.frame
/ tibble
与模型参数,它是简单地运用它们arima
或forecast::Arima
。
Regarding the identifiability of a SARIMA model based on the auto.arima
model string, let's take a look at some specific models.关于基于auto.arima
模型字符串的 SARIMA 模型的可识别性,我们来看看一些具体的模型。
Model 1 is an ARMA model with a constant模型 1 是具有常数的 ARMA 模型
Model 2 is an ARIMA model with drift模型 2 是带有漂移的 ARIMA 模型
We can expand the model to get我们可以展开模型得到
Model 3 is an ARIMA model with drift and a constant模型 3 是具有漂移和常数的 ARIMA 模型
When we expand model 3 we notice that the constant term cancels due to the differencing, and we end up with the same model 2.当我们扩展模型 3 时,我们注意到常数项由于差分而取消,我们最终得到相同的模型 2。
So d=1
models with drift and drift+constant describe the same process.所以带有漂移和漂移+常数的d=1
模型描述了相同的过程。 In this case, reporting a model as eg "ARIMA(1,1,0) with drift" would refer to the same model as saying "ARIMA(1,1,0) with drift and non-zero mean".在这种情况下,将模型报告为例如“具有漂移的 ARIMA(1,1,0)”将指代与说“具有漂移和非零均值的 ARIMA(1,1,0)”相同的模型。
For d=0
, using include.drift = TRUE
by default results in a drift and constant offset parameter, see eg Constants and ARIMA models in R .对于d=0
,默认情况下使用include.drift = TRUE
导致漂移和常量偏移参数,请参见R 中的常量和 ARIMA 模型。 If you specify include.drift = TRUE
and then manually force include.mean = FALSE
you get a different model (as you show in mod1
and mod2
).如果您指定include.drift = TRUE
然后手动强制include.mean = FALSE
您会得到一个不同的模型(如mod1
和mod2
)。 From what I understand, this will never happen in forecast::auto.arima
, where include.drift = TRUE
implicitly always has include.mean = TRUE
.据我了解,这永远不会发生在forecast::auto.arima
,其中include.drift = TRUE
隐式总是包含include.mean = TRUE
。 So a SARIMA string ARIMA(1,0,0)(1,0,0)[12] with drift
from forecast::auto.arima
would translate to a SARIMA(1,0,0)(1,0,0)[12] model with drift and offset.因此, ARIMA(1,0,0)(1,0,0)[12] with drift
从forecast::auto.arima
ARIMA(1,0,0)(1,0,0)[12] with drift
的SARIMA 字符串ARIMA(1,0,0)(1,0,0)[12] with drift
将转换为SARIMA(1,0,0)(1,0,0) [12] 带有漂移和偏移的模型。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.