![](/img/trans.png)
[英]StatsModels SARIMAX with exogenous variables - how to extract exogenous coefficients
[英]How to use statsmodels' ARMA to predict with exogenous variables?
我正在尝试使用 statsmodels 来拟合具有外生变量的 AR(MA) 过程。 为此,我生成了一个带有延迟外生变量的 AR(0) 过程的实现,我正在尝试恢复我对它的期望。 我能够正确地适应这个过程,但我无法使用predict
方法。
以下代码是我想要实现的 MCVE。 它被大量评论,以便您可以轻松地关注它。 最后一个声明是一个失败的断言,我想让它通过。 我怀疑罪魁祸首是我如何调用函数.predict
。
import numpy as np
import statsmodels.tsa.api
def _transform_x(x, lag):
"""
Converts a set of time series into a matrix of delayed signals.
For x.shape[0] == 1, it is equivalent to call `statsmodels.tsa.api.lagmat(x_i, lag)`.
For x.shape[0] == 1, each `row_j` is each time `t`, `column_i` is the signal at `t - i`,
It assumes that no past signal => no signal: each row is left-padded with zeros.
For example, for lag=3, the matrix would be:
```
[0, 0 , 0 ] (-> y[0])
[0, 0 , x[0]] (-> y[1])
[0, x[0], x[1]] (-> y[2])
```
I.e.
The parameter fitted to column 2, a2, is the influence of `x[t - 1]` on `y[t]`.
The parameter fitted to column 1, a1, is the influence of `x[t - 2]` on `y[t]`.
It assumes that we only measure x[t] when we measure y[t], the reason why that column does not appear.
For x.shape[0] > 1, it returns a concatenation of each of the matrixes for each signal.
"""
for x_i in x:
assert len(x_i) >= lag
assert len(x_i.shape) == 1, 'Each of the elements must be a time-series (1D)'
return np.concatenate(tuple(statsmodels.tsa.api.lagmat(x_i, lag) for x_i in x), axis=1)
# build the realization of the process y[t] = 1*x[t-2] + noise, where x[t] is iid from N(1,1)
t = np.arange(0, 1000, 1)
np.random.seed(1)
# the exogenous variable
x1 = 1 + np.random.normal(size=t.shape)
# this shifts x by 2 (puts the last element in the beginning, we set the beginning to 0)
y = np.roll(x1, 2) + np.random.normal(scale=0.01, size=t.shape)
y[0] = y[1] = 0
x = np.array([x1]) # x.shape[0] => each exogenous variable; x.shape[1] => each time point
# fit it with AR(2) + exogenous(2)
lag = 2
result = statsmodels.tsa.api.ARMA(y, (lag, 0), exog=_transform_x(x, lag)).fit(disp=False)
# this gives the expected. Specifically, `x2 = 0.9952` and all others are indistinguishable from 0.
# (x2 here means the highest delay, 2).
print(result.summary())
# predict 1 element out-of-sample. Because the process is y[t] = x[0, t - 2] + noise,
# the prediction should be equal to `x[0, -2]`
y_pred = result.predict(len(y), len(y), exog=_transform_x(x[:, -3:], lag))[0]
# this fails!
np.testing.assert_almost_equal(y_pred, x[0, -2], decimal=2)
有两个问题,据我所知
exog=_transform_x(x[:, -3:], lag)
在 predict 中有初始值问题并且包括零而不是滞后。
索引:y[-1] 的预测应该是 x[-3],即落后两个。 如果我们想预测下一次观测,那么我们需要一个对应于预测期的扩展 exog x 数组。
如果我改变了这一点,那么 y[-1] 的断言就会传递给我:
>>> y_pred = result.predict(len(y)-1, len(y)-1, exog=_transform_x(x[:, -10:], lag)[-1])
>>> y_pred
>>> array([ 0.9308579])
>>> result.fittedvalues[-1]
>>>
0.93085789893991366
>>> x[0, -3]
0.93037546054487086
>>> np.testing.assert_almost_equal(y_pred, x[0, -3], decimal=2)
>>>
以上是为了预测最后一次观察。 要预测第一个样本外观察,我们需要最后一个和倒数第二个 x,这是无法通过 _transform_x 函数获得的。 例如,我只是在列表中提供它。
>>> y_pred = result.predict(len(y), len(y), exog=[[x[0, -1], x[0, -2]]])
>>> y_pred
array([ 1.35420494])
>>> x[0, -2]
1.3538704268828403
>>> np.testing.assert_almost_equal(y_pred, x[0, -2], decimal=2)
>>>
更一般地说,为了预测更长远的范围,我们需要一系列未来的解释变量
>>> xx = np.concatenate((x, np.ones((x.shape[0], 10))), axis=1)
>>> result.predict(len(y), len(y)+9, exog=_transform_x(xx[:, -(10+lag):], lag)[-10:])
>>>
array([ 1.35420494, 0.81332158, 1.00030139, 1.00030334, 1.000303 ,
1.00030299, 1.00030299, 1.00030299, 1.00030299, 1.00030299])
我选择了索引,以便 predict 的 exog 包含第一行中的最后两个观察值。
>>> _transform_x(xx[:, -(10+lag):], lag)[lag:]
array([[ 0.81304498, 1.35387043],
[ 1. , 0.81304498],
[ 1. , 1. ],
...,
[ 1. , 1. ],
[ 1. , 1. ],
[ 1. , 1. ]])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.