简体   繁体   English

如何使用 statsmodels 的 ARMA 来预测外生变量?

[英]How to use statsmodels' ARMA to predict with exogenous variables?

I am trying to use statsmodels to fit an AR(MA) process with exogenous variables.我正在尝试使用 statsmodels 来拟合具有外生变量的 AR(MA) 过程。 For that, I generated a realization of an AR(0) process with a delayed exogenous variable and I am trying to recover what I would expect from it.为此,我生成了一个带有延迟外生变量的 AR(0) 过程的实现,我正在尝试恢复我对它的期望。 I am able to correctly fit the process, but I am not being able to use the predict method.我能够正确地适应这个过程,但我无法使用predict方法。

The following code is an MCVE of what I want to achieve.以下代码是我想要实现的 MCVE。 It is heavily commented so that you can easily follow it.它被大量评论,以便您可以轻松地关注它。 The last statement is an assertion that fails, and I would like to make it pass.最后一个声明是一个失败的断言,我想让它通过。 I suspect that the culprit is how I am calling the function .predict .我怀疑罪魁祸首是我如何调用函数.predict

import numpy as np
import statsmodels.tsa.api

def _transform_x(x, lag):
    Converts a set of time series into a matrix of delayed signals.
    For x.shape[0] == 1, it is equivalent to call `statsmodels.tsa.api.lagmat(x_i, lag)`.

    For x.shape[0] == 1, each `row_j` is each time `t`, `column_i` is the signal at `t - i`,
    It assumes that no past signal => no signal: each row is left-padded with zeros.

    For example, for lag=3, the matrix would be:
    [0, 0   , 0   ] (-> y[0])
    [0, 0   , x[0]] (-> y[1])
    [0, x[0], x[1]] (-> y[2])

    The parameter fitted to column 2, a2, is the influence of `x[t - 1]` on `y[t]`.
    The parameter fitted to column 1, a1, is the influence of `x[t - 2]` on `y[t]`.
    It assumes that we only measure x[t] when we measure y[t], the reason why that column does not appear.

    For x.shape[0] > 1, it returns a concatenation of each of the matrixes for each signal.
    for x_i in x:
        assert len(x_i) >= lag
        assert len(x_i.shape) == 1, 'Each of the elements must be a time-series (1D)'
    return np.concatenate(tuple(statsmodels.tsa.api.lagmat(x_i, lag) for x_i in x), axis=1)

# build the realization of the process y[t] = 1*x[t-2] + noise, where x[t] is iid from N(1,1)
t = np.arange(0, 1000, 1)

# the exogenous variable
x1 = 1 + np.random.normal(size=t.shape)

# this shifts x by 2 (puts the last element in the beginning, we set the beginning to 0)
y = np.roll(x1, 2) + np.random.normal(scale=0.01, size=t.shape)
y[0] = y[1] = 0

x = np.array([x1])  # x.shape[0] => each exogenous variable; x.shape[1] => each time point

# fit it with AR(2) + exogenous(2)
lag = 2

result = statsmodels.tsa.api.ARMA(y, (lag, 0), exog=_transform_x(x, lag)).fit(disp=False)

# this gives the expected. Specifically, `x2 = 0.9952` and all others are indistinguishable from 0.
# (x2 here means the highest delay, 2).

# predict 1 element out-of-sample. Because the process is y[t] = x[0, t - 2] + noise,
# the prediction should be equal to `x[0, -2]`
y_pred = result.predict(len(y), len(y), exog=_transform_x(x[:, -3:], lag))[0]

# this fails!
np.testing.assert_almost_equal(y_pred, x[0, -2], decimal=2)

There are two problems, as far as I can see有两个问题,据我所知

exog=_transform_x(x[:, -3:], lag) in predict has the initial value problem and includes zeros instead of lags. exog=_transform_x(x[:, -3:], lag)在 predict 中有初始值问题并且包括零而不是滞后。

indexing: the prediction for y[-1] should be x[-3], ie two lags behind.索引:y[-1] 的预测应该是 x[-3],即落后两个。 If we want to forecast the next observation, then we need an extended exog x array corresponding to the forecast period.如果我们想预测下一次观测,那么我们需要一个对应于预测期的扩展 exog x 数组。

If I change this, then the assert passes for me for y[-1]:如果我改变了这一点,那么 y[-1] 的断言就会传递给我:

>>> y_pred = result.predict(len(y)-1, len(y)-1, exog=_transform_x(x[:, -10:], lag)[-1])
>>> y_pred
>>> array([ 0.9308579])
>>> result.fittedvalues[-1]

>>> x[0, -3]

>>> np.testing.assert_almost_equal(y_pred, x[0, -3], decimal=2)

The above is for predicting the last observation.以上是为了预测最后一次观察。 To forecast the first out of sample observation, we need the last and the second to last x, which cannot be obtained through the _transform_x function.要预测第一个样本外观察,我们需要最后一个和倒数第二个 x,这是无法通过 _transform_x 函数获得的。 For the example, I just provide it in a list.例如,我只是在列表中提供它。

>>> y_pred = result.predict(len(y), len(y), exog=[[x[0, -1], x[0, -2]]])
>>> y_pred
array([ 1.35420494])
>>> x[0, -2]
>>> np.testing.assert_almost_equal(y_pred, x[0, -2], decimal=2)

More general, to forecast for a longer horizon, we need an array of future explanatory variables更一般地说,为了预测更长远的范围,我们需要一系列未来的解释变量

>>> xx = np.concatenate((x, np.ones((x.shape[0], 10))), axis=1)
>>> result.predict(len(y), len(y)+9, exog=_transform_x(xx[:, -(10+lag):], lag)[-10:])
array([ 1.35420494,  0.81332158,  1.00030139,  1.00030334,  1.000303  ,
        1.00030299,  1.00030299,  1.00030299,  1.00030299,  1.00030299])

I have chosen the indexing so that the exog for predict contains the last two observations in the first row.我选择了索引,以便 predict 的 exog 包含第一行中的最后两个观察值。

>>> _transform_x(xx[:, -(10+lag):], lag)[lag:]
array([[ 0.81304498,  1.35387043],
       [ 1.        ,  0.81304498],
       [ 1.        ,  1.        ],
       [ 1.        ,  1.        ],
       [ 1.        ,  1.        ],
       [ 1.        ,  1.        ]])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 StatsModels SARIMAX with exogenous variables - 如何提取外生系数 - StatsModels SARIMAX with exogenous variables - how to extract exogenous coefficients python statsmodels ARMA plot_predict - python statsmodels ARMA plot_predict statsmodels ARMA预测样本外 - statsmodels ARMA to predict out-of-sample 具有外生变量矩阵的statsmodels SARIMAX大小不同 - statsmodels SARIMAX with exogenous variables matrices are different sizes 使用python statsmodels修复summary_col中的标签外生变量 - Fix Label Exogenous Variables in summary_col with python statsmodels 在 Statsmodels -python 中使用 SARIMAX 预测具有外生变量的样本外 - Forecasting out-of-sample with exogenous variables using SARIMAX in Statsmodels -python Statsmodels中的ARMA订单规范 - ARMA Order Specification in Statsmodels 如何修复statsmodels中的.predict()函数? - How to fix .predict() function in statsmodels? 覆盖要在sklearn上下文中使用的statsmodels GLM中的predict() - Overriding predict() in statsmodels GLM to use in sklearn context 在从 Python 中的 statsmodels 传递到 SARIMAX() 的 exog 参数之前,我们是否需要对外生变量进行差分? - Do we need to do differencing of exogenous variables before passing to exog argument of SARIMAX() from statsmodels in Python?
粤ICP备18138465号  © 2020-2024 STACKOOM.COM