简体   繁体   English

使用 arima 对 python 进行分钟间隔时间序列的预测

[英]Prediction of time series with minute interval using arima for python

I am a beginner in machine learning for time series, I need to develop a project, where my data is composed of minutes, could someone help me create this algorithm?我是时间序列机器学习的初学者,我需要开发一个项目,我的数据由分钟组成,有人可以帮我创建这个算法吗?

Data set: Each value represents one minute of collection (9:00, 9:01...), the collection lasts 10 minutes and was performed in 2 months, that is, 10 values for January and 10 values for the month of February.数据集:每个值代表一分钟的采集(9:00、9:01...),采集持续10分钟,分2个月进行,即1月10个值,2月10个值.

在此处输入图像描述

Complete data 完整数据

Objective: I would like my result to be a forecast of the next 10 minutes for month of March, example:目标:我希望我的结果是对 3 月份接下来 10 分钟的预测,例如:

2020-03-01 9:00:00
2020-03-01 9:01:00
2020-03-01 9:02:00
2020-03-01 9:03:00

Training: The training must contain the month of January and February as a reference for forecasting, taking into account that it is a time series培训:培训必须包含一月和二月作为预测的参考,考虑到它是一个时间序列

Seasonal:季节性:

在此处输入图像描述

Forecast:预报:

Current problem: it seems that the current forecast is failing, the previous data does not seem to be valid as a time series, because, as can be seen in the seasonality image, the data set is shown as a straight line.当前问题:当前预测似乎失败了,之前的数据似乎作为时间序列无效,因为从季节性图像中可以看出,数据集显示为一条直线。 The forecast is represented by the green line in the figure below, and the original data by the blue line, however as we see the date axis is going until 2020-11-01, it should go until 2020-03-01, in addition the original data form a rectangle in the graph预测由下图中的绿线表示,原始数据由蓝线表示,但是我们看到日期轴一直到 2020-11-01,它应该 go 到 2020-03-01,此外原始数据在图中形成一个矩形

在此处输入图像描述

script.py脚本.py

# -*- coding: utf-8 -*-

try:
    import pandas as pd
    import numpy as np
    import pmdarima as pm
    #%matplotlib inline
    import matplotlib.pyplot as plt
    from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
    from statsmodels.tsa.arima_model import ARIMA
    from statsmodels.tsa.seasonal import seasonal_decompose
    from dateutil.parser import parse
except ImportError as e:
    print("[FAILED] {}".format(e))

class operationsArima():

    @staticmethod
    def ForecastingWithArima():

        try:

            # Import
            data = pd.read_csv('minute.csv', parse_dates=['date'], index_col='date')

            # Plot
            fig, axes = plt.subplots(2, 1, figsize=(10,5), dpi=100, sharex=True)

            # Usual Differencing
            axes[0].plot(data[:], label='Original Series')
            axes[0].plot(data[:].diff(1), label='Usual Differencing')
            axes[0].set_title('Usual Differencing')
            axes[0].legend(loc='upper left', fontsize=10)
            print("[OK] Generated axes")

            # Seasonal
            axes[1].plot(data[:], label='Original Series')
            axes[1].plot(data[:].diff(11), label='Seasonal Differencing', color='green')
            axes[1].set_title('Seasonal Differencing')
            plt.legend(loc='upper left', fontsize=10)
            plt.suptitle('Drug Sales', fontsize=16)
            plt.show()

            # Seasonal - fit stepwise auto-ARIMA
            smodel = pm.auto_arima(data, start_p=1, start_q=1,
                                    test='adf',
                                    max_p=3, max_q=3, m=11,
                                    start_P=0, seasonal=True,
                                    d=None, D=1, trace=True,
                                    error_action='ignore',
                                    suppress_warnings=True,
                                    stepwise=True)

            smodel.summary()
            print(smodel.summary())
            print("[OK] Generated model")

            # Forecast
            n_periods = 11
            fitted, confint = smodel.predict(n_periods=n_periods, return_conf_int=True)
            index_of_fc = pd.date_range(data.index[-1], periods = n_periods, freq='MS')

            # make series for plotting purpose
            fitted_series = pd.Series(fitted, index=index_of_fc)
            lower_series = pd.Series(confint[:, 0], index=index_of_fc)
            upper_series = pd.Series(confint[:, 1], index=index_of_fc)
            print("[OK] Generated series")

            # Plot
            plt.plot(data)
            plt.plot(fitted_series, color='darkgreen')
            plt.fill_between(lower_series.index,
                            lower_series,
                            upper_series,
                            color='k', alpha=.15)

            plt.title("ARIMA - Final Forecast - Drug Sales")
            plt.show()
            print("[SUCESS] Generated forecast")

        except Exception as e:

            print("[FAILED] Caused by: {}".format(e))

if __name__ == "__main__":
    flow = operationsArima()
    flow.ForecastingWithArima() # Init script

Sumary:总结:

                                SARIMAX Results                                 
================================================================================
Dep. Variable:                        y   No. Observations:                   22
Model:             SARIMAX(0, 1, 0, 11)   Log Likelihood                     nan
Date:                  Mon, 13 Apr 2020   AIC                                nan
Time:                          21:19:10   BIC                                nan
Sample:                               0   HQIC                               nan
                                   - 22                                         
Covariance Type:                    opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
intercept           0   5.33e-13          0      1.000   -1.05e-12    1.05e-12
sigma2          1e-10   5.81e-10      0.172      0.863   -1.04e-09    1.24e-09
===================================================================================
Ljung-Box (Q):                         nan   Jarque-Bera (JB):                  nan
Prob(Q):                               nan   Prob(JB):                          nan
Heteroskedasticity (H):                nan   Skew:                              nan
Prob(H) (two-sided):                   nan   Kurtosis:                          nan
===================================================================================

I see a couple of problems here: As you have two short 1-minute frequency time series with a month separation, it is normal to observe the straight line in your blue line that you mention.我在这里看到了几个问题:由于您有两个间隔一个月的短 1 分钟频率时间序列,因此在您提到的蓝线中观察直线是正常的。 In addition, the green line looks like the original data itself, what means that the model's forecast is exactly the same as your original data.此外,绿线看起来像原始数据本身,这意味着模型的预测与您的原始数据完全相同。

Finally, I don't think it's a good idea to stick together two separate time-series...最后,我认为将两个独立的时间序列放在一起并不是一个好主意......

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM