简体   繁体   中英

How to determine the appropriate forecasting techniques for non-stationary univariate time series data?

I am new and I don't have background in time-series analysis or machine learning. Therefore, I am posting this question here:

I have three time-series data.

temp_anomaly is annual temperature anomaly data from 1959 to 2021.

co2_annual is annual CO2 data from 1959 to 2021.

temperature is hourly temperature data for one week for a random place.

temp_anomaly is as follows:

    Temperature Anomaly
Year    
1959    0.03
1960    -0.03
1961    0.06
1962    0.03
1963    0.05
... ...
2017    0.92
2018    0.84
2019    0.97
2020    1.01
2021    0.84

co2_annual is as follows:

CO2
Year    
1959    315.98
1960    316.91
1961    317.64
1962    318.45
1963    318.99
... ...
2017    406.76
2018    408.72
2019    411.66
2020    414.24
2021    416.45

And temperature is as follows:

Temperature
Datetime    
2021-01-01 01:00:00 19.19
2021-01-01 02:00:00 18.54
2021-01-01 03:00:00 17.94
2021-01-01 04:00:00 17.35
2021-01-01 05:00:00 16.80
... ...
2021-01-05 20:00:00 24.53
2021-01-05 21:00:00 23.68
2021-01-05 22:00:00 22.83
2021-01-05 23:00:00 21.99
2021-01-06 00:00:00 21.15

temp_anomaly.to_dict() gives following:

{'Temperature Anomaly': {1959: 0.03,
  1960: -0.03,
  1961: 0.06,
  1962: 0.03,
  1963: 0.05,
  1964: -0.2,
  1965: -0.11,
  1966: -0.06,
  1967: -0.02,
  1968: -0.08,
  1969: 0.05,
  1970: 0.02,
  1971: -0.08,
  1972: 0.01,
  1973: 0.16,
  1974: -0.07,
  1975: -0.01,
  1976: -0.1,
  1977: 0.18,
  1978: 0.07,
  1979: 0.16,
  1980: 0.26,
  1981: 0.32,
  1982: 0.14,
  1983: 0.31,
  1984: 0.16,
  1985: 0.12,
  1986: 0.18,
  1987: 0.32,
  1988: 0.39,
  1989: 0.27,
  1990: 0.45,
  1991: 0.4,
  1992: 0.22,
  1993: 0.23,
  1994: 0.31,
  1995: 0.44,
  1996: 0.33,
  1997: 0.46,
  1998: 0.61,
  1999: 0.38,
  2000: 0.39,
  2001: 0.53,
  2002: 0.62,
  2003: 0.62,
  2004: 0.53,
  2005: 0.67,
  2006: 0.63,
  2007: 0.66,
  2008: 0.54,
  2009: 0.65,
  2010: 0.72,
  2011: 0.61,
  2012: 0.64,
  2013: 0.67,
  2014: 0.74,
  2015: 0.89,
  2016: 1.01,
  2017: 0.92,
  2018: 0.84,
  2019: 0.97,
  2020: 1.01,
  2021: 0.84}}

co2_annual.to_dict() gives follows:

{'CO2': {1959: 315.98,
  1960: 316.91,
  1961: 317.64,
  1962: 318.45,
  1963: 318.99,
  1964: 319.62,
  1965: 320.04,
  1966: 321.37,
  1967: 322.18,
  1968: 323.05,
  1969: 324.62,
  1970: 325.68,
  1971: 326.32,
  1972: 327.46,
  1973: 329.68,
  1974: 330.19,
  1975: 331.12,
  1976: 332.03,
  1977: 333.84,
  1978: 335.41,
  1979: 336.84,
  1980: 338.76,
  1981: 340.12,
  1982: 341.48,
  1983: 343.15,
  1984: 344.85,
  1985: 346.35,
  1986: 347.61,
  1987: 349.31,
  1988: 351.69,
  1989: 353.2,
  1990: 354.45,
  1991: 355.7,
  1992: 356.54,
  1993: 357.21,
  1994: 358.96,
  1995: 360.97,
  1996: 362.74,
  1997: 363.88,
  1998: 366.84,
  1999: 368.54,
  2000: 369.71,
  2001: 371.32,
  2002: 373.45,
  2003: 375.98,
  2004: 377.7,
  2005: 379.98,
  2006: 382.09,
  2007: 384.02,
  2008: 385.83,
  2009: 387.64,
  2010: 390.1,
  2011: 391.85,
  2012: 394.06,
  2013: 396.74,
  2014: 398.81,
  2015: 401.01,
  2016: 404.41,
  2017: 406.76,
  2018: 408.72,
  2019: 411.66,
  2020: 414.24,
  2021: 416.45}}

And temperature.to_dict() is as follows:

{'Temperature': {Timestamp('2021-01-01 01:00:00'): 19.19,
  Timestamp('2021-01-01 02:00:00'): 18.54,
  Timestamp('2021-01-01 03:00:00'): 17.94,
  Timestamp('2021-01-01 04:00:00'): 17.35,
  Timestamp('2021-01-01 05:00:00'): 16.8,
  Timestamp('2021-01-01 06:00:00'): 16.98,
  Timestamp('2021-01-01 07:00:00'): 19.19,
  Timestamp('2021-01-01 08:00:00'): 22.06,
  Timestamp('2021-01-01 09:00:00'): 26.63,
  Timestamp('2021-01-01 10:00:00'): 31.21,
  Timestamp('2021-01-01 11:00:00'): 33.39,
  Timestamp('2021-01-01 12:00:00'): 34.54,
  Timestamp('2021-01-01 13:00:00'): 35.08,
  Timestamp('2021-01-01 14:00:00'): 35.1,
  Timestamp('2021-01-01 15:00:00'): 34.62,
  Timestamp('2021-01-01 16:00:00'): 32.73,
  Timestamp('2021-01-01 17:00:00'): 29.28,
  Timestamp('2021-01-01 18:00:00'): 26.87,
  Timestamp('2021-01-01 19:00:00'): 25.38,
  Timestamp('2021-01-01 20:00:00'): 24.29,
  Timestamp('2021-01-01 21:00:00'): 23.32,
  Timestamp('2021-01-01 22:00:00'): 22.44,
  Timestamp('2021-01-01 23:00:00'): 21.58,
  Timestamp('2021-01-02 00:00:00'): 20.8,
  Timestamp('2021-01-02 01:00:00'): 20.04,
  Timestamp('2021-01-02 02:00:00'): 19.31,
  Timestamp('2021-01-02 03:00:00'): 18.62,
  Timestamp('2021-01-02 04:00:00'): 17.99,
  Timestamp('2021-01-02 05:00:00'): 17.43,
  Timestamp('2021-01-02 06:00:00'): 17.67,
  Timestamp('2021-01-02 07:00:00'): 20.1,
  Timestamp('2021-01-02 08:00:00'): 23.03,
  Timestamp('2021-01-02 09:00:00'): 27.71,
  Timestamp('2021-01-02 10:00:00'): 32.69,
  Timestamp('2021-01-02 11:00:00'): 34.76,
  Timestamp('2021-01-02 12:00:00'): 35.8,
  Timestamp('2021-01-02 13:00:00'): 36.28,
  Timestamp('2021-01-02 14:00:00'): 36.29,
  Timestamp('2021-01-02 15:00:00'): 35.77,
  Timestamp('2021-01-02 16:00:00'): 33.52,
  Timestamp('2021-01-02 17:00:00'): 29.22,
  Timestamp('2021-01-02 18:00:00'): 27.54,
  Timestamp('2021-01-02 19:00:00'): 26.52,
  Timestamp('2021-01-02 20:00:00'): 25.41,
  Timestamp('2021-01-02 21:00:00'): 24.28,
  Timestamp('2021-01-02 22:00:00'): 23.27,
  Timestamp('2021-01-02 23:00:00'): 22.4,
  Timestamp('2021-01-03 00:00:00'): 21.65,
  Timestamp('2021-01-03 01:00:00'): 20.96,
  Timestamp('2021-01-03 02:00:00'): 20.31,
  Timestamp('2021-01-03 03:00:00'): 19.66,
  Timestamp('2021-01-03 04:00:00'): 19.02,
  Timestamp('2021-01-03 05:00:00'): 18.39,
  Timestamp('2021-01-03 06:00:00'): 18.39,
  Timestamp('2021-01-03 07:00:00'): 20.37,
  Timestamp('2021-01-03 08:00:00'): 23.57,
  Timestamp('2021-01-03 09:00:00'): 28.55,
  Timestamp('2021-01-03 10:00:00'): 32.82,
  Timestamp('2021-01-03 11:00:00'): 34.8,
  Timestamp('2021-01-03 12:00:00'): 35.96,
  Timestamp('2021-01-03 13:00:00'): 36.46,
  Timestamp('2021-01-03 14:00:00'): 36.35,
  Timestamp('2021-01-03 15:00:00'): 35.65,
  Timestamp('2021-01-03 16:00:00'): 32.99,
  Timestamp('2021-01-03 17:00:00'): 28.96,
  Timestamp('2021-01-03 18:00:00'): 27.33,
  Timestamp('2021-01-03 19:00:00'): 26.08,
  Timestamp('2021-01-03 20:00:00'): 25.08,
  Timestamp('2021-01-03 21:00:00'): 24.21,
  Timestamp('2021-01-03 22:00:00'): 23.29,
  Timestamp('2021-01-03 23:00:00'): 22.31,
  Timestamp('2021-01-04 00:00:00'): 21.45,
  Timestamp('2021-01-04 01:00:00'): 20.76,
  Timestamp('2021-01-04 02:00:00'): 20.16,
  Timestamp('2021-01-04 03:00:00'): 19.55,
  Timestamp('2021-01-04 04:00:00'): 18.83,
  Timestamp('2021-01-04 05:00:00'): 18.19,
  Timestamp('2021-01-04 06:00:00'): 18.3,
  Timestamp('2021-01-04 07:00:00'): 20.44,
  Timestamp('2021-01-04 08:00:00'): 23.28,
  Timestamp('2021-01-04 09:00:00'): 28.12,
  Timestamp('2021-01-04 10:00:00'): 33.13,
  Timestamp('2021-01-04 11:00:00'): 35.01,
  Timestamp('2021-01-04 12:00:00'): 36.01,
  Timestamp('2021-01-04 13:00:00'): 36.39,
  Timestamp('2021-01-04 14:00:00'): 36.2,
  Timestamp('2021-01-04 15:00:00'): 35.43,
  Timestamp('2021-01-04 16:00:00'): 32.58,
  Timestamp('2021-01-04 17:00:00'): 28.07,
  Timestamp('2021-01-04 18:00:00'): 26.42,
  Timestamp('2021-01-04 19:00:00'): 25.34,
  Timestamp('2021-01-04 20:00:00'): 24.34,
  Timestamp('2021-01-04 21:00:00'): 23.37,
  Timestamp('2021-01-04 22:00:00'): 22.43,
  Timestamp('2021-01-04 23:00:00'): 21.53,
  Timestamp('2021-01-05 00:00:00'): 20.65,
  Timestamp('2021-01-05 01:00:00'): 19.77,
  Timestamp('2021-01-05 02:00:00'): 18.9,
  Timestamp('2021-01-05 03:00:00'): 18.12,
  Timestamp('2021-01-05 04:00:00'): 17.43,
  Timestamp('2021-01-05 05:00:00'): 16.79,
  Timestamp('2021-01-05 06:00:00'): 16.96,
  Timestamp('2021-01-05 07:00:00'): 19.51,
  Timestamp('2021-01-05 08:00:00'): 22.61,
  Timestamp('2021-01-05 09:00:00'): 27.27,
  Timestamp('2021-01-05 10:00:00'): 31.78,
  Timestamp('2021-01-05 11:00:00'): 34.93,
  Timestamp('2021-01-05 12:00:00'): 36.12,
  Timestamp('2021-01-05 13:00:00'): 36.58,
  Timestamp('2021-01-05 14:00:00'): 36.44,
  Timestamp('2021-01-05 15:00:00'): 35.7,
  Timestamp('2021-01-05 16:00:00'): 32.33,
  Timestamp('2021-01-05 17:00:00'): 28.05,
  Timestamp('2021-01-05 18:00:00'): 26.45,
  Timestamp('2021-01-05 19:00:00'): 25.42,
  Timestamp('2021-01-05 20:00:00'): 24.53,
  Timestamp('2021-01-05 21:00:00'): 23.68,
  Timestamp('2021-01-05 22:00:00'): 22.83,
  Timestamp('2021-01-05 23:00:00'): 21.99,
  Timestamp('2021-01-06 00:00:00'): 21.15}}

These data look as follows while plotting together: 在此处输入图像描述

I did a test for stationarity for all three time-series using Augmented Dickey Fuller Test.

def test_stationarity(timeseries):
    from statsmodels.tsa.stattools import adfuller

    #ADF test
    result = adfuller(timeseries)

    adf_statistic = result[0]
    p_value = result[1]
    print ("ADF Statistic: ", adf_statistic)
    print ("p-value: ", p_value)

    if p_value > 0.05:
        print ("We fail to reject the null hypothesis. Time series is not stationary, i.e. it has time-dependent features.")

    else:
        print ("Reject the null hypothesis. Time series is stationary, i.e. it does not have time-dependent features.")

The test resulted that all three time series data are not stationary. They have time-dependent features such as trends and seasonality.

I'd like to predict or forecast temp_anomaly and co2_annual values from 2022 to 2050. And I'd like to forecast temperature values for two more days (48 hours).

I came to know there are different forecasting techniques such as exponential smoothing, moving average, ARIMA, SARIMAX, LSTM, PROPHET, etc. This made it more confusing to me.

I'd like to know what would be the appropriate forecasting technique I should utilise that can yield minimum error. Is there a way to pre-determine the appropriate forecasting technique/model based on the nature of the time-series data or it can only be evaluated later on?

Also, what are the steps needed for forecasting these three time-series data based on the appropriate technique(s)?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM