如何使用 statsmodels 的 ARMA 来预测外生变量？

Question

I am trying to use statsmodels to fit an AR(MA) process with exogenous variables.我正在尝试使用 statsmodels 来拟合具有外生变量的 AR(MA) 过程。 For that, I generated a realization of an AR(0) process with a delayed exogenous variable and I am trying to recover what I would expect from it.为此，我生成了一个带有延迟外生变量的 AR(0) 过程的实现，我正在尝试恢复我对它的期望。 I am able to correctly fit the process, but I am not being able to use the predict method.我能够正确地适应这个过程，但我无法使用predict方法。

The following code is an MCVE of what I want to achieve.以下代码是我想要实现的 MCVE。 It is heavily commented so that you can easily follow it.它被大量评论，以便您可以轻松地关注它。 The last statement is an assertion that fails, and I would like to make it pass.最后一个声明是一个失败的断言，我想让它通过。 I suspect that the culprit is how I am calling the function .predict .我怀疑罪魁祸首是我如何调用函数.predict 。

import numpy as np
import statsmodels.tsa.api


def _transform_x(x, lag):
    """
    Converts a set of time series into a matrix of delayed signals.
    For x.shape[0] == 1, it is equivalent to call `statsmodels.tsa.api.lagmat(x_i, lag)`.

    For x.shape[0] == 1, each `row_j` is each time `t`, `column_i` is the signal at `t - i`,
    It assumes that no past signal => no signal: each row is left-padded with zeros.

    For example, for lag=3, the matrix would be:
    ```
    [0, 0   , 0   ] (-> y[0])
    [0, 0   , x[0]] (-> y[1])
    [0, x[0], x[1]] (-> y[2])
    ```

    I.e.
    The parameter fitted to column 2, a2, is the influence of `x[t - 1]` on `y[t]`.
    The parameter fitted to column 1, a1, is the influence of `x[t - 2]` on `y[t]`.
    It assumes that we only measure x[t] when we measure y[t], the reason why that column does not appear.

    For x.shape[0] > 1, it returns a concatenation of each of the matrixes for each signal.
    """
    for x_i in x:
        assert len(x_i) >= lag
        assert len(x_i.shape) == 1, 'Each of the elements must be a time-series (1D)'
    return np.concatenate(tuple(statsmodels.tsa.api.lagmat(x_i, lag) for x_i in x), axis=1)


# build the realization of the process y[t] = 1*x[t-2] + noise, where x[t] is iid from N(1,1)
t = np.arange(0, 1000, 1)
np.random.seed(1)

# the exogenous variable
x1 = 1 + np.random.normal(size=t.shape)

# this shifts x by 2 (puts the last element in the beginning, we set the beginning to 0)
y = np.roll(x1, 2) + np.random.normal(scale=0.01, size=t.shape)
y[0] = y[1] = 0

x = np.array([x1])  # x.shape[0] => each exogenous variable; x.shape[1] => each time point

# fit it with AR(2) + exogenous(2)
lag = 2

result = statsmodels.tsa.api.ARMA(y, (lag, 0), exog=_transform_x(x, lag)).fit(disp=False)

# this gives the expected. Specifically, `x2 = 0.9952` and all others are indistinguishable from 0.
# (x2 here means the highest delay, 2).
print(result.summary())

# predict 1 element out-of-sample. Because the process is y[t] = x[0, t - 2] + noise,
# the prediction should be equal to `x[0, -2]`
y_pred = result.predict(len(y), len(y), exog=_transform_x(x[:, -3:], lag))[0]

# this fails!
np.testing.assert_almost_equal(y_pred, x[0, -2], decimal=2)

Answer 1

There are two problems, as far as I can see有两个问题，据我所知

exog=_transform_x(x[:, -3:], lag) in predict has the initial value problem and includes zeros instead of lags. exog=_transform_x(x[:, -3:], lag)在 predict 中有初始值问题并且包括零而不是滞后。

indexing: the prediction for y[-1] should be x[-3], ie two lags behind.索引：y[-1] 的预测应该是 x[-3]，即落后两个。 If we want to forecast the next observation, then we need an extended exog x array corresponding to the forecast period.如果我们想预测下一次观测，那么我们需要一个对应于预测期的扩展 exog x 数组。

If I change this, then the assert passes for me for y[-1]:如果我改变了这一点，那么 y[-1] 的断言就会传递给我：

>>> y_pred = result.predict(len(y)-1, len(y)-1, exog=_transform_x(x[:, -10:], lag)[-1])
>>> y_pred
>>> array([ 0.9308579])
>>> result.fittedvalues[-1]
>>> 
0.93085789893991366

>>> x[0, -3]
0.93037546054487086

>>> np.testing.assert_almost_equal(y_pred, x[0, -3], decimal=2)
>>>

The above is for predicting the last observation.以上是为了预测最后一次观察。 To forecast the first out of sample observation, we need the last and the second to last x, which cannot be obtained through the _transform_x function.要预测第一个样本外观察，我们需要最后一个和倒数第二个 x，这是无法通过 _transform_x 函数获得的。 For the example, I just provide it in a list.例如，我只是在列表中提供它。

>>> y_pred = result.predict(len(y), len(y), exog=[[x[0, -1], x[0, -2]]])
>>> y_pred
array([ 1.35420494])
>>> x[0, -2]
1.3538704268828403
>>> np.testing.assert_almost_equal(y_pred, x[0, -2], decimal=2)
>>>

More general, to forecast for a longer horizon, we need an array of future explanatory variables更一般地说，为了预测更长远的范围，我们需要一系列未来的解释变量

>>> xx = np.concatenate((x, np.ones((x.shape[0], 10))), axis=1)
>>> result.predict(len(y), len(y)+9, exog=_transform_x(xx[:, -(10+lag):], lag)[-10:])
>>> 
array([ 1.35420494,  0.81332158,  1.00030139,  1.00030334,  1.000303  ,
        1.00030299,  1.00030299,  1.00030299,  1.00030299,  1.00030299])

I have chosen the indexing so that the exog for predict contains the last two observations in the first row.我选择了索引，以便 predict 的 exog 包含第一行中的最后两个观察值。

>>> _transform_x(xx[:, -(10+lag):], lag)[lag:]
array([[ 0.81304498,  1.35387043],
       [ 1.        ,  0.81304498],
       [ 1.        ,  1.        ],
       ..., 
       [ 1.        ,  1.        ],
       [ 1.        ,  1.        ],
       [ 1.        ,  1.        ]])

如何使用 statsmodels 的 ARMA 来预测外生变量？

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-02-08 15:38:54

如何使用 statsmodels 的 ARMA 来预测外生变量？

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-02-08 15:38:54

解决方案1
1 已采纳 2018-02-08 15:38:54