如何使用 statsmodels 的 ARMA 來預測外生變量？

Question

我正在嘗試使用 statsmodels 來擬合具有外生變量的 AR(MA) 過程。 為此，我生成了一個帶有延遲外生變量的 AR(0) 過程的實現，我正在嘗試恢復我對它的期望。 我能夠正確地適應這個過程，但我無法使用predict方法。

以下代碼是我想要實現的 MCVE。 它被大量評論，以便您可以輕松地關注它。 最后一個聲明是一個失敗的斷言，我想讓它通過。 我懷疑罪魁禍首是我如何調用函數.predict 。

import numpy as np
import statsmodels.tsa.api


def _transform_x(x, lag):
    """
    Converts a set of time series into a matrix of delayed signals.
    For x.shape[0] == 1, it is equivalent to call `statsmodels.tsa.api.lagmat(x_i, lag)`.

    For x.shape[0] == 1, each `row_j` is each time `t`, `column_i` is the signal at `t - i`,
    It assumes that no past signal => no signal: each row is left-padded with zeros.

    For example, for lag=3, the matrix would be:
    ```
    [0, 0   , 0   ] (-> y[0])
    [0, 0   , x[0]] (-> y[1])
    [0, x[0], x[1]] (-> y[2])
    ```

    I.e.
    The parameter fitted to column 2, a2, is the influence of `x[t - 1]` on `y[t]`.
    The parameter fitted to column 1, a1, is the influence of `x[t - 2]` on `y[t]`.
    It assumes that we only measure x[t] when we measure y[t], the reason why that column does not appear.

    For x.shape[0] > 1, it returns a concatenation of each of the matrixes for each signal.
    """
    for x_i in x:
        assert len(x_i) >= lag
        assert len(x_i.shape) == 1, 'Each of the elements must be a time-series (1D)'
    return np.concatenate(tuple(statsmodels.tsa.api.lagmat(x_i, lag) for x_i in x), axis=1)


# build the realization of the process y[t] = 1*x[t-2] + noise, where x[t] is iid from N(1,1)
t = np.arange(0, 1000, 1)
np.random.seed(1)

# the exogenous variable
x1 = 1 + np.random.normal(size=t.shape)

# this shifts x by 2 (puts the last element in the beginning, we set the beginning to 0)
y = np.roll(x1, 2) + np.random.normal(scale=0.01, size=t.shape)
y[0] = y[1] = 0

x = np.array([x1])  # x.shape[0] => each exogenous variable; x.shape[1] => each time point

# fit it with AR(2) + exogenous(2)
lag = 2

result = statsmodels.tsa.api.ARMA(y, (lag, 0), exog=_transform_x(x, lag)).fit(disp=False)

# this gives the expected. Specifically, `x2 = 0.9952` and all others are indistinguishable from 0.
# (x2 here means the highest delay, 2).
print(result.summary())

# predict 1 element out-of-sample. Because the process is y[t] = x[0, t - 2] + noise,
# the prediction should be equal to `x[0, -2]`
y_pred = result.predict(len(y), len(y), exog=_transform_x(x[:, -3:], lag))[0]

# this fails!
np.testing.assert_almost_equal(y_pred, x[0, -2], decimal=2)

Answer 1

有兩個問題，據我所知

exog=_transform_x(x[:, -3:], lag)在 predict 中有初始值問題並且包括零而不是滯后。

索引：y[-1] 的預測應該是 x[-3]，即落后兩個。 如果我們想預測下一次觀測，那么我們需要一個對應於預測期的擴展 exog x 數組。

如果我改變了這一點，那么 y[-1] 的斷言就會傳遞給我：

>>> y_pred = result.predict(len(y)-1, len(y)-1, exog=_transform_x(x[:, -10:], lag)[-1])
>>> y_pred
>>> array([ 0.9308579])
>>> result.fittedvalues[-1]
>>> 
0.93085789893991366

>>> x[0, -3]
0.93037546054487086

>>> np.testing.assert_almost_equal(y_pred, x[0, -3], decimal=2)
>>>

以上是為了預測最后一次觀察。 要預測第一個樣本外觀察，我們需要最后一個和倒數第二個 x，這是無法通過 _transform_x 函數獲得的。 例如，我只是在列表中提供它。

>>> y_pred = result.predict(len(y), len(y), exog=[[x[0, -1], x[0, -2]]])
>>> y_pred
array([ 1.35420494])
>>> x[0, -2]
1.3538704268828403
>>> np.testing.assert_almost_equal(y_pred, x[0, -2], decimal=2)
>>>

更一般地說，為了預測更長遠的范圍，我們需要一系列未來的解釋變量

>>> xx = np.concatenate((x, np.ones((x.shape[0], 10))), axis=1)
>>> result.predict(len(y), len(y)+9, exog=_transform_x(xx[:, -(10+lag):], lag)[-10:])
>>> 
array([ 1.35420494,  0.81332158,  1.00030139,  1.00030334,  1.000303  ,
        1.00030299,  1.00030299,  1.00030299,  1.00030299,  1.00030299])

我選擇了索引，以便 predict 的 exog 包含第一行中的最后兩個觀察值。

>>> _transform_x(xx[:, -(10+lag):], lag)[lag:]
array([[ 0.81304498,  1.35387043],
       [ 1.        ,  0.81304498],
       [ 1.        ,  1.        ],
       ..., 
       [ 1.        ,  1.        ],
       [ 1.        ,  1.        ],
       [ 1.        ,  1.        ]])

如何使用 statsmodels 的 ARMA 來預測外生變量？

問題描述

1 個解決方案

解決方案1
1 已采納 2018-02-08 15:38:54

如何使用 statsmodels 的 ARMA 來預測外生變量？

問題描述

1 個解決方案

解決方案1 1 已采納 2018-02-08 15:38:54

解決方案1
1 已采納 2018-02-08 15:38:54