如何使用 statsmodels 的 ARMA 来预测外生变量？

Question

我正在尝试使用 statsmodels 来拟合具有外生变量的 AR(MA) 过程。 为此，我生成了一个带有延迟外生变量的 AR(0) 过程的实现，我正在尝试恢复我对它的期望。 我能够正确地适应这个过程，但我无法使用predict方法。

以下代码是我想要实现的 MCVE。 它被大量评论，以便您可以轻松地关注它。 最后一个声明是一个失败的断言，我想让它通过。 我怀疑罪魁祸首是我如何调用函数.predict 。

import numpy as np
import statsmodels.tsa.api


def _transform_x(x, lag):
    """
    Converts a set of time series into a matrix of delayed signals.
    For x.shape[0] == 1, it is equivalent to call `statsmodels.tsa.api.lagmat(x_i, lag)`.

    For x.shape[0] == 1, each `row_j` is each time `t`, `column_i` is the signal at `t - i`,
    It assumes that no past signal => no signal: each row is left-padded with zeros.

    For example, for lag=3, the matrix would be:
    ```
    [0, 0   , 0   ] (-> y[0])
    [0, 0   , x[0]] (-> y[1])
    [0, x[0], x[1]] (-> y[2])
    ```

    I.e.
    The parameter fitted to column 2, a2, is the influence of `x[t - 1]` on `y[t]`.
    The parameter fitted to column 1, a1, is the influence of `x[t - 2]` on `y[t]`.
    It assumes that we only measure x[t] when we measure y[t], the reason why that column does not appear.

    For x.shape[0] > 1, it returns a concatenation of each of the matrixes for each signal.
    """
    for x_i in x:
        assert len(x_i) >= lag
        assert len(x_i.shape) == 1, 'Each of the elements must be a time-series (1D)'
    return np.concatenate(tuple(statsmodels.tsa.api.lagmat(x_i, lag) for x_i in x), axis=1)


# build the realization of the process y[t] = 1*x[t-2] + noise, where x[t] is iid from N(1,1)
t = np.arange(0, 1000, 1)
np.random.seed(1)

# the exogenous variable
x1 = 1 + np.random.normal(size=t.shape)

# this shifts x by 2 (puts the last element in the beginning, we set the beginning to 0)
y = np.roll(x1, 2) + np.random.normal(scale=0.01, size=t.shape)
y[0] = y[1] = 0

x = np.array([x1])  # x.shape[0] => each exogenous variable; x.shape[1] => each time point

# fit it with AR(2) + exogenous(2)
lag = 2

result = statsmodels.tsa.api.ARMA(y, (lag, 0), exog=_transform_x(x, lag)).fit(disp=False)

# this gives the expected. Specifically, `x2 = 0.9952` and all others are indistinguishable from 0.
# (x2 here means the highest delay, 2).
print(result.summary())

# predict 1 element out-of-sample. Because the process is y[t] = x[0, t - 2] + noise,
# the prediction should be equal to `x[0, -2]`
y_pred = result.predict(len(y), len(y), exog=_transform_x(x[:, -3:], lag))[0]

# this fails!
np.testing.assert_almost_equal(y_pred, x[0, -2], decimal=2)

Answer 1

有两个问题，据我所知

exog=_transform_x(x[:, -3:], lag)在 predict 中有初始值问题并且包括零而不是滞后。

索引：y[-1] 的预测应该是 x[-3]，即落后两个。 如果我们想预测下一次观测，那么我们需要一个对应于预测期的扩展 exog x 数组。

如果我改变了这一点，那么 y[-1] 的断言就会传递给我：

>>> y_pred = result.predict(len(y)-1, len(y)-1, exog=_transform_x(x[:, -10:], lag)[-1])
>>> y_pred
>>> array([ 0.9308579])
>>> result.fittedvalues[-1]
>>> 
0.93085789893991366

>>> x[0, -3]
0.93037546054487086

>>> np.testing.assert_almost_equal(y_pred, x[0, -3], decimal=2)
>>>

以上是为了预测最后一次观察。 要预测第一个样本外观察，我们需要最后一个和倒数第二个 x，这是无法通过 _transform_x 函数获得的。 例如，我只是在列表中提供它。

>>> y_pred = result.predict(len(y), len(y), exog=[[x[0, -1], x[0, -2]]])
>>> y_pred
array([ 1.35420494])
>>> x[0, -2]
1.3538704268828403
>>> np.testing.assert_almost_equal(y_pred, x[0, -2], decimal=2)
>>>

更一般地说，为了预测更长远的范围，我们需要一系列未来的解释变量

>>> xx = np.concatenate((x, np.ones((x.shape[0], 10))), axis=1)
>>> result.predict(len(y), len(y)+9, exog=_transform_x(xx[:, -(10+lag):], lag)[-10:])
>>> 
array([ 1.35420494,  0.81332158,  1.00030139,  1.00030334,  1.000303  ,
        1.00030299,  1.00030299,  1.00030299,  1.00030299,  1.00030299])

我选择了索引，以便 predict 的 exog 包含第一行中的最后两个观察值。

>>> _transform_x(xx[:, -(10+lag):], lag)[lag:]
array([[ 0.81304498,  1.35387043],
       [ 1.        ,  0.81304498],
       [ 1.        ,  1.        ],
       ..., 
       [ 1.        ,  1.        ],
       [ 1.        ,  1.        ],
       [ 1.        ,  1.        ]])

如何使用 statsmodels 的 ARMA 来预测外生变量？

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-02-08 15:38:54

如何使用 statsmodels 的 ARMA 来预测外生变量？

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-02-08 15:38:54

解决方案1
1 已采纳 2018-02-08 15:38:54