[英]Prediction of time series with minute interval using arima for python
I am a beginner in machine learning for time series, I need to develop a project, where my data is composed of minutes, could someone help me create this algorithm?我是时间序列机器学习的初学者,我需要开发一个项目,我的数据由分钟组成,有人可以帮我创建这个算法吗?
Data set: Each value represents one minute of collection (9:00, 9:01...), the collection lasts 10 minutes and was performed in 2 months, that is, 10 values for January and 10 values for the month of February.数据集:每个值代表一分钟的采集(9:00、9:01...),采集持续10分钟,分2个月进行,即1月10个值,2月10个值.
Objective: I would like my result to be a forecast of the next 10 minutes for month of March, example:目标:我希望我的结果是对 3 月份接下来 10 分钟的预测,例如:
2020-03-01 9:00:00
2020-03-01 9:01:00
2020-03-01 9:02:00
2020-03-01 9:03:00
Training: The training must contain the month of January and February as a reference for forecasting, taking into account that it is a time series培训:培训必须包含一月和二月作为预测的参考,考虑到它是一个时间序列
Seasonal:季节性:
Forecast:预报:
Current problem: it seems that the current forecast is failing, the previous data does not seem to be valid as a time series, because, as can be seen in the seasonality image, the data set is shown as a straight line.
当前问题:当前预测似乎失败了,之前的数据似乎作为时间序列无效,因为从季节性图像中可以看出,数据集显示为一条直线。 The forecast is represented by the green line in the figure below, and the original data by the blue line, however as we see the date axis is going until 2020-11-01, it should go until 2020-03-01, in addition the original data form a rectangle in the graph
预测由下图中的绿线表示,原始数据由蓝线表示,但是我们看到日期轴一直到 2020-11-01,它应该 go 到 2020-03-01,此外原始数据在图中形成一个矩形
script.py脚本.py
# -*- coding: utf-8 -*-
try:
import pandas as pd
import numpy as np
import pmdarima as pm
#%matplotlib inline
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.arima_model import ARIMA
from statsmodels.tsa.seasonal import seasonal_decompose
from dateutil.parser import parse
except ImportError as e:
print("[FAILED] {}".format(e))
class operationsArima():
@staticmethod
def ForecastingWithArima():
try:
# Import
data = pd.read_csv('minute.csv', parse_dates=['date'], index_col='date')
# Plot
fig, axes = plt.subplots(2, 1, figsize=(10,5), dpi=100, sharex=True)
# Usual Differencing
axes[0].plot(data[:], label='Original Series')
axes[0].plot(data[:].diff(1), label='Usual Differencing')
axes[0].set_title('Usual Differencing')
axes[0].legend(loc='upper left', fontsize=10)
print("[OK] Generated axes")
# Seasonal
axes[1].plot(data[:], label='Original Series')
axes[1].plot(data[:].diff(11), label='Seasonal Differencing', color='green')
axes[1].set_title('Seasonal Differencing')
plt.legend(loc='upper left', fontsize=10)
plt.suptitle('Drug Sales', fontsize=16)
plt.show()
# Seasonal - fit stepwise auto-ARIMA
smodel = pm.auto_arima(data, start_p=1, start_q=1,
test='adf',
max_p=3, max_q=3, m=11,
start_P=0, seasonal=True,
d=None, D=1, trace=True,
error_action='ignore',
suppress_warnings=True,
stepwise=True)
smodel.summary()
print(smodel.summary())
print("[OK] Generated model")
# Forecast
n_periods = 11
fitted, confint = smodel.predict(n_periods=n_periods, return_conf_int=True)
index_of_fc = pd.date_range(data.index[-1], periods = n_periods, freq='MS')
# make series for plotting purpose
fitted_series = pd.Series(fitted, index=index_of_fc)
lower_series = pd.Series(confint[:, 0], index=index_of_fc)
upper_series = pd.Series(confint[:, 1], index=index_of_fc)
print("[OK] Generated series")
# Plot
plt.plot(data)
plt.plot(fitted_series, color='darkgreen')
plt.fill_between(lower_series.index,
lower_series,
upper_series,
color='k', alpha=.15)
plt.title("ARIMA - Final Forecast - Drug Sales")
plt.show()
print("[SUCESS] Generated forecast")
except Exception as e:
print("[FAILED] Caused by: {}".format(e))
if __name__ == "__main__":
flow = operationsArima()
flow.ForecastingWithArima() # Init script
Sumary:总结:
SARIMAX Results
================================================================================
Dep. Variable: y No. Observations: 22
Model: SARIMAX(0, 1, 0, 11) Log Likelihood nan
Date: Mon, 13 Apr 2020 AIC nan
Time: 21:19:10 BIC nan
Sample: 0 HQIC nan
- 22
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
intercept 0 5.33e-13 0 1.000 -1.05e-12 1.05e-12
sigma2 1e-10 5.81e-10 0.172 0.863 -1.04e-09 1.24e-09
===================================================================================
Ljung-Box (Q): nan Jarque-Bera (JB): nan
Prob(Q): nan Prob(JB): nan
Heteroskedasticity (H): nan Skew: nan
Prob(H) (two-sided): nan Kurtosis: nan
===================================================================================
I see a couple of problems here: As you have two short 1-minute frequency time series with a month separation, it is normal to observe the straight line in your blue line that you mention.我在这里看到了几个问题:由于您有两个间隔一个月的短 1 分钟频率时间序列,因此在您提到的蓝线中观察直线是正常的。 In addition, the green line looks like the original data itself, what means that the model's forecast is exactly the same as your original data.
此外,绿线看起来像原始数据本身,这意味着模型的预测与您的原始数据完全相同。
Finally, I don't think it's a good idea to stick together two separate time-series...最后,我认为将两个独立的时间序列放在一起并不是一个好主意......
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.