简体   繁体   中英

Time series forecast R

I have been working on a project ad finally I was able to get the data to create my time series, however when it comes to forecast I'm not having the best results. I have tried multiple methods I've found but still, not the best performance.

Any idea on how to forecast this time serie? I would really appreciate the help!

Here is the data:

https://docs.google.com/spreadsheets/d/1Vf8rkReur_XV75ne1DjtdHI9TLzX3o-5/edit?usp=drivesdk&ouid=106978581381695763375&rtpof=true&sd=true

I tried to use ARIMA, ets and snaive methods to check the first results but for some reason I keep receiving a stright line as result and tbats just does not seem to be working properly.

在此处输入图像描述

I tried to use the prophet method and the results were a lot better. I was able to get a MAPE of 34.2, which is not great but is better than the others, but I'm still wondering if there is a better way to forecast this time series to get a better result.

在此处输入图像描述

Unfortunately you did not share the code your are working so I can not reproduce your scenario as a starting point, also the conext might be important as I will explain:

The validation on training data is a good starting point, after all you want to proof the capability of generalisation - but with time series one should generate new models and from them new predictions as soon as a new data point is recorded (commonly but no necessarily the latest records are the most relevant for predicitions in this domain). Therefore the choice of how much data to hold out and also how many points to validate predictions on can be critical in terms of model selection. Further the choice of the validation metric should be context based, which is a minor topic of its own.

A far broader topic of its own is the ussage of a (possible more than one) explanatory variable, also called regressor. This can be as simple as a boolean dummy variable (indicating the presence of an event), a continuous number or even a one hot encoded factor. Unexplained outliers can influence models drastically (some less some more), which explanatory variables can help to control for. The topic is complex and mostly supposes you know the regressor for the prediction horizon or can predict them better as the time series. If a explanatory variable is unknow and not well predictable might it might still be of use (just to predict the next value of the time series for example).

To exemplify the importance of the set up and the handling correction outliers I wrote a very basic function that has 3 inputs:

  • remove is the quantity of observations to be left out

  • clean.ts is a boolean that indicates if you like the training data to be cleaned

  • predict is the quantity of observations to be predicted and validated on

The function generates 2 outputs:

  • accuracy

  • plot

A new model type I included is .netar from the forecast packges - which is a "neural.network autoregression" (NNAR). The function has many variables like the numebr of nodes which heavily influences the result (the examples uses the auto definition and 50 nodes for comparision).

library(tidyverse)
library(forecast)
library(TSstudio)
library(openxlsx)

# put in the path to your excel file
df <- openxlsx::read.xlsx(".../dummydata.xlsx") %>%
    dplyr::mutate(Date = as.Date(Date, origin = "1899-12-30")) %>%
    dplyr::arrange(Date) 

tseries <- ts(df$TurOver, frequency = 12)

myf <- function(remove, clean.ts, predict){
    split_tseries <- TSstudio::ts_split(ts.obj = tseries, sample.out = remove)
    tseries_training <- split_tseries$train
    if (clean.ts == TRUE) tseries_training <- forecast::tsclean(tseries_training)
    tseries_testing <- split_tseries$test
    marima <- forecast::auto.arima(tseries_training, stepwise = FALSE) 
    farima <- forecast::forecast(marima, h = predict)
    mtbats <- forecast::tbats(tseries_training)
    ftbats <- forecast::forecast(mtbats, h = predict)
    # you need a seed for neural net due to random start weight IIRC this applies to nnetar to
    set.seed(481)
    # neural net (lambda ensures you get positive values... there is much more to the topic)
    mnnetar <- forecast::nnetar(tseries_training, lambda=0)
    fnnetar <- forecast::forecast(mnnetar, h = predict)
    mnnetar50 <- forecast::nnetar(tseries_training, size = 50, lambda=0)
    fnnetar50 <- forecast::forecast(mnnetar50, h = predict)

    acc <- list(arima = accuracy(farima$mean, tseries_testing), 
                tbats = accuracy(ftbats$mean, tseries_testing),
                nnetar = accuracy(fnnetar$mean, tseries_testing),
                nnetar50 = accuracy(fnnetar50$mean, tseries_testing))

    plt <- forecast::autoplot(tseries, lwd = 2) + 
        autolayer(farima, series = "arima", PI = FALSE, lwd = 1) +
        autolayer(ftbats, series = "tbats", PI = FALSE, lwd = 1) +
        autolayer(fnnetar, series = "nnetar", PI = FALSE, lwd = 1) +
        autolayer(fnnetar50, series = "nnetar50", PI = FALSE, lwd = 1)

    return(list(acc = acc, plt = plt))
}

myf(24, FALSE, predict = 24)[["acc"]]

$arima
                   ME        RMSE         MAE       MPE     MAPE      ACF1 Theil's U
Test set -0.001343845 0.002536706 0.002100797 -60.96293 70.11816 0.6435248  2.889715

$tbats
                    ME        RMSE         MAE       MPE     MAPE      ACF1 Theil's U
Test set -0.0007685619 0.002302062 0.001640931 -42.62639 55.90108 0.5009757  3.660085

$nnetar
                   ME        RMSE         MAE       MPE    MAPE      ACF1 Theil's U
Test set 0.0002695596 0.002003039 0.001599084 -20.56516 46.9986 0.5977031  1.825066

$nnetar50
                   ME        RMSE         MAE       MPE     MAPE      ACF1 Theil's U
Test set 0.0006839859 0.001962521 0.001513359 -6.548584 37.92716 0.5386324  1.449772

res <- myf(24, TRUE, predict = 24)

res[["acc"]]

$arima
                  ME        RMSE        MAE      MPE     MAPE      ACF1 Theil's U
Test set 0.002957762 0.004058595 0.00340108 37.37937 67.39746 0.7941113  1.897743

$tbats
                    ME        RMSE        MAE       MPE     MAPE      ACF1 Theil's U
Test set -0.0003070341 0.001822889 0.00139368 -29.57777 45.81817 0.6598989  2.271175

$nnetar
               ME        RMSE         MAE       MPE     MAPE      ACF1 Theil's U
Test set -0.002677447 0.003356156 0.002929986 -79.59701 83.24975 0.4806293  2.961381

$nnetar50
                ME        RMSE         MAE       MPE     MAPE      ACF1 Theil's U
Test set -5.590709e-05 0.005615723 0.004071816 -18.23728 80.33139 0.5137529  2.847373

res[["plt"]]

在此处输入图像描述

Looking at the results compare and play with the function parameters to get a feeling of what an increase of the training data results in, reduction of periods to predict, if time series cleaning has any influence, wheather the validation metric is serving its purpose for the apliaction, etc. The prophet alogrith is though for dayly data if I am not mistaken, you could include this in the code (at least the accuracy part) to see if it did just "accidentally" out perform.

There is much, much more to the topic of time series prediction and how to improve them. A good starting point it the open book from the author of the forecast package: https://otexts.com/fpp2/

looking at your time series this link might also help: Time Series Breakout/Change/Disturbance Detection in R: strucchange, changepoint, BreakoutDetection, bfast, and more

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM