[英]invert stationarity for ARIMA model
how do I invert the stationarity and reapply the dates to the data for plotting? 如何反转平稳性并将日期重新应用于数据进行绘图?
srcs: srcs:
I am trying to invert stationarity and get a plot of prediction, particularly for two columns called ' app_1', and ' app_2, (the orange and red lines below). 我正在尝试反转平稳性并获得预测图,尤其是对于名为“ app_1”和“ app_2”的两列(下面的橙色和红色线)。
The data I am drawing from looks like this: 我从中提取的数据如下所示:
print(u1.info())
u1.head()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 15011 entries, 2017-08-28 11:00:00 to 2018-01-31 19:30:00
Freq: 15T
Data columns (total 10 columns):
app_1 15011 non-null float64
app_2 15011 non-null float64
user 15011 non-null object
bar 15011 non-null float64
grocers 15011 non-null float64
home 15011 non-null float64
lunch 15011 non-null float64
park 15011 non-null float64
relatives 15011 non-null float64
work 15011 non-null float64
dtypes: float64(9), object(1)
memory usage: 1.3+ MB
app_1 app_2 user bar grocers home lunch park relatives work
date
2017-08-28 11:00:00 0.010000 0.0 user_1 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2017-08-28 11:15:00 0.010125 0.0 user_1 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2017-08-28 11:30:00 0.010250 0.0 user_1 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2017-08-28 11:45:00 0.010375 0.0 user_1 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2017-08-28 12:00:00 0.010500 0.0 user_1 0.0 0.0 0.0 0.0 0.0 0.0 0.0
the location column represent a location the user is at at a given time -- after the first "significant location change" event, one and only one column will be a 1 at a time. location列代表用户在给定时间的位置-在第一次“重大位置更改”事件之后,一次且只有一个列一次为1。
I am analyzing this with VARIMAX -- using statsmodels VARMAX version of AR.: 我正在使用VARIMAX进行分析-使用statsmodels VARMAX版本的AR:
from statsmodels.tsa.statespace.varmax import VARMAX
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
from random import random
#...
columns = [ ' app_1', ' app_2', ' bar', ' grocers', ' home', ' lunch', ' work', ' park', ' relatives' ]
series = u1[columns]
# from: https://machinelearningmastery.com/make-predictions-time-series-forecasting-python/
# create a difference transform of the dataset
def difference(dataset):
diff = list()
for i in range(1, len(dataset)):
value = dataset[i] - dataset[i - 1]
diff.append(value)
return np.array(diff)
# Make a prediction give regression coefficients and lag obs
def predict(coef, history):
yhat = coef[0]
for i in range(1, len(coef)):
yhat += coef[i] * history[-i]
return yhat
X = pd.DataFrame()
for column in columns:
X[column] = difference(series[column].values)
size = (4*24)*54 # hoping
train, test = X[0:size], X[size:size+(14*4*24)]
train = train.loc[:, (train != train.iloc[0]).any()] # https://stackoverflow.com/questions/20209600/panda-dataframe-remove-constant-column
test = test.loc[:, (test != test.iloc[0]).any()] # https://stackoverflow.com/questions/20209600/panda-dataframe-remove-constant-column
#print(train.var(), X.info())
# train autoregression
model = VARMAX(train)
model_fit = model.fit(method='powell', disp=False)
#print(model_fit.mle_retvals)
##window = model_fit.k_ar
coef = model_fit.params
# walk forward over time steps in test
history = [train.iloc[i] for i in range(len(train))]
predictions = list()
for t in range(len(test)):
yhat = predict(coef, history)
obs = test.iloc[t]
predictions.append(yhat)
history.append(obs)
print(mean_squared_error(test, predictions))
0.5594208989876831
That mean_squared_error from scikitlearn is not horrifying (its roughly the middle of the three samples shown in the documentation, in fact). 来自scikitlearn的那个mean_squared_error并不令人恐惧(实际上,它大约是文档中显示的三个样本的中间)。 That _could mean that the data is predictive.
_可能意味着数据是可预测的。 I'd like to see that in a plot.
我想在情节中看到它。
# plot
plt.plot(test)
plt.plot(predictions, color='red')
plt.show()
So, part of what is going on here is that the data is seasonal, so it had to have stationarity applied to it. 因此,这里发生的部分原因是数据是季节性的,因此必须对其应用平稳性。 Now the lines are all vertical, instead of temporal.
现在,这些线都是垂直的,而不是时间的。
But another thing that concerns me is the scale of the red data. 但是让我担心的另一件事是红色数据的规模 。 That's a lot of red .
太多了 。 Anyway, how do I invert the stationarity and reapply the dates to the data for plotting?
无论如何,我该如何反转平稳性并将日期重新应用于数据进行绘图? It obviously should not look like that.
它显然不应该那样。 :)
:)
这样做的方法首先是将其制作为数据框:
predDf = pd.DataFrame(predictions)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.