简体   繁体   English

ARIMA模型的平稳性反转

[英]invert stationarity for ARIMA model

how do I invert the stationarity and reapply the dates to the data for plotting? 如何反转平稳性并将日期重新应用于数据进行绘图?

srcs: srcs:

I am trying to invert stationarity and get a plot of prediction, particularly for two columns called ' app_1', and ' app_2, (the orange and red lines below). 我正在尝试反转平稳性并获得预测图,尤其是对于名为“ app_1”和“ app_2”的两列(下面的橙色和红色线)。

The data I am drawing from looks like this: 我从中提取的数据如下所示: 绘图数据集

print(u1.info())
u1.head()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 15011 entries, 2017-08-28 11:00:00 to 2018-01-31 19:30:00
Freq: 15T
Data columns (total 10 columns):
 app_1        15011 non-null float64
 app_2        15011 non-null float64
user          15011 non-null object
 bar          15011 non-null float64
 grocers      15011 non-null float64
 home         15011 non-null float64
 lunch        15011 non-null float64
 park         15011 non-null float64
 relatives    15011 non-null float64
 work         15011 non-null float64
dtypes: float64(9), object(1)
memory usage: 1.3+ MB

app_1   app_2   user    bar grocers home    lunch   park    relatives   work
date                                        
2017-08-28 11:00:00 0.010000    0.0 user_1  0.0 0.0 0.0 0.0 0.0 0.0 0.0
2017-08-28 11:15:00 0.010125    0.0 user_1  0.0 0.0 0.0 0.0 0.0 0.0 0.0
2017-08-28 11:30:00 0.010250    0.0 user_1  0.0 0.0 0.0 0.0 0.0 0.0 0.0
2017-08-28 11:45:00 0.010375    0.0 user_1  0.0 0.0 0.0 0.0 0.0 0.0 0.0
2017-08-28 12:00:00 0.010500    0.0 user_1  0.0 0.0 0.0 0.0 0.0 0.0 0.0

the location column represent a location the user is at at a given time -- after the first "significant location change" event, one and only one column will be a 1 at a time. location列代表用户在给定时间的位置-在第一次“重大位置更改”事件之后,一次且只有一个列一次为1。

I am analyzing this with VARIMAX -- using statsmodels VARMAX version of AR.: 我正在使用VARIMAX进行分析-使用statsmodels VARMAX版本的AR:

from statsmodels.tsa.statespace.varmax import VARMAX
import pandas as pd
import numpy as np

%matplotlib inline

import matplotlib
import matplotlib.pyplot as plt

from random import random
#...

columns = [ ' app_1', ' app_2', ' bar', ' grocers', ' home', ' lunch', ' work', ' park', ' relatives' ]
series = u1[columns]

# from: https://machinelearningmastery.com/make-predictions-time-series-forecasting-python/
# create a difference transform of the dataset
def difference(dataset):
    diff = list()
    for i in range(1, len(dataset)):
        value = dataset[i] - dataset[i - 1]
        diff.append(value)
    return np.array(diff)

# Make a prediction give regression coefficients and lag obs
def predict(coef, history):
    yhat = coef[0]
    for i in range(1, len(coef)):
        yhat += coef[i] * history[-i]
    return yhat

X = pd.DataFrame()
for column in columns:
    X[column] = difference(series[column].values)

size = (4*24)*54 # hoping
train, test = X[0:size], X[size:size+(14*4*24)]

train = train.loc[:, (train != train.iloc[0]).any()] # https://stackoverflow.com/questions/20209600/panda-dataframe-remove-constant-column
test = test.loc[:, (test != test.iloc[0]).any()] # https://stackoverflow.com/questions/20209600/panda-dataframe-remove-constant-column

#print(train.var(), X.info())

# train autoregression
model = VARMAX(train)
model_fit = model.fit(method='powell', disp=False)
#print(model_fit.mle_retvals)

##window = model_fit.k_ar
coef = model_fit.params

# walk forward over time steps in test
history = [train.iloc[i] for i in range(len(train))]
predictions = list()
for t in range(len(test)):
    yhat = predict(coef, history)
    obs = test.iloc[t]
    predictions.append(yhat)
    history.append(obs) 

print(mean_squared_error(test, predictions))

0.5594208989876831

That mean_squared_error from scikitlearn is not horrifying (its roughly the middle of the three samples shown in the documentation, in fact). 来自scikitlearn的那个mean_squared_error并不令人恐惧(实际上,它大约是文档中显示的三个样本的中间)。 That _could mean that the data is predictive. _可能意味着数据是可预测的。 I'd like to see that in a plot. 我想在情节中看到它。

# plot
plt.plot(test)
plt.plot(predictions, color='red')
plt.show()

预测图

So, part of what is going on here is that the data is seasonal, so it had to have stationarity applied to it. 因此,这里发生的部分原因是数据是季节性的,因此必须对其应用平稳性。 Now the lines are all vertical, instead of temporal. 现在,这些线都是垂直的,而不是时间的。

But another thing that concerns me is the scale of the red data. 但是让我担心的另一件事是红色数据的规模 That's a lot of red . 太多了 Anyway, how do I invert the stationarity and reapply the dates to the data for plotting? 无论如何,我该如何反转平稳性并将日期重新应用于数据进行绘图? It obviously should not look like that. 它显然不应该那样。 :) :)

这样做的方法首先是将其制作为数据框:

predDf = pd.DataFrame(predictions)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM