[英]invert stationarity for ARIMA model
如何反轉平穩性並將日期重新應用於數據進行繪圖?
srcs:
我正在嘗試反轉平穩性並獲得預測圖,尤其是對於名為“ app_1”和“ app_2”的兩列(下面的橙色和紅色線)。
print(u1.info())
u1.head()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 15011 entries, 2017-08-28 11:00:00 to 2018-01-31 19:30:00
Freq: 15T
Data columns (total 10 columns):
app_1 15011 non-null float64
app_2 15011 non-null float64
user 15011 non-null object
bar 15011 non-null float64
grocers 15011 non-null float64
home 15011 non-null float64
lunch 15011 non-null float64
park 15011 non-null float64
relatives 15011 non-null float64
work 15011 non-null float64
dtypes: float64(9), object(1)
memory usage: 1.3+ MB
app_1 app_2 user bar grocers home lunch park relatives work
date
2017-08-28 11:00:00 0.010000 0.0 user_1 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2017-08-28 11:15:00 0.010125 0.0 user_1 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2017-08-28 11:30:00 0.010250 0.0 user_1 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2017-08-28 11:45:00 0.010375 0.0 user_1 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2017-08-28 12:00:00 0.010500 0.0 user_1 0.0 0.0 0.0 0.0 0.0 0.0 0.0
location列代表用戶在給定時間的位置-在第一次“重大位置更改”事件之后,一次且只有一個列一次為1。
我正在使用VARIMAX進行分析-使用statsmodels VARMAX版本的AR:
from statsmodels.tsa.statespace.varmax import VARMAX
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
from random import random
#...
columns = [ ' app_1', ' app_2', ' bar', ' grocers', ' home', ' lunch', ' work', ' park', ' relatives' ]
series = u1[columns]
# from: https://machinelearningmastery.com/make-predictions-time-series-forecasting-python/
# create a difference transform of the dataset
def difference(dataset):
diff = list()
for i in range(1, len(dataset)):
value = dataset[i] - dataset[i - 1]
diff.append(value)
return np.array(diff)
# Make a prediction give regression coefficients and lag obs
def predict(coef, history):
yhat = coef[0]
for i in range(1, len(coef)):
yhat += coef[i] * history[-i]
return yhat
X = pd.DataFrame()
for column in columns:
X[column] = difference(series[column].values)
size = (4*24)*54 # hoping
train, test = X[0:size], X[size:size+(14*4*24)]
train = train.loc[:, (train != train.iloc[0]).any()] # https://stackoverflow.com/questions/20209600/panda-dataframe-remove-constant-column
test = test.loc[:, (test != test.iloc[0]).any()] # https://stackoverflow.com/questions/20209600/panda-dataframe-remove-constant-column
#print(train.var(), X.info())
# train autoregression
model = VARMAX(train)
model_fit = model.fit(method='powell', disp=False)
#print(model_fit.mle_retvals)
##window = model_fit.k_ar
coef = model_fit.params
# walk forward over time steps in test
history = [train.iloc[i] for i in range(len(train))]
predictions = list()
for t in range(len(test)):
yhat = predict(coef, history)
obs = test.iloc[t]
predictions.append(yhat)
history.append(obs)
print(mean_squared_error(test, predictions))
0.5594208989876831
來自scikitlearn的那個mean_squared_error並不令人恐懼(實際上,它大約是文檔中顯示的三個樣本的中間)。 _可能意味着數據是可預測的。 我想在情節中看到它。
# plot
plt.plot(test)
plt.plot(predictions, color='red')
plt.show()
因此,這里發生的部分原因是數據是季節性的,因此必須對其應用平穩性。 現在,這些線都是垂直的,而不是時間的。
但是讓我擔心的另一件事是紅色數據的規模 。 太多了 。 無論如何,我該如何反轉平穩性並將日期重新應用於數據進行繪圖? 它顯然不應該那樣。 :)
這樣做的方法首先是將其制作為數據框:
predDf = pd.DataFrame(predictions)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.