简体   繁体   中英

understanding fbprophet cross_validation

I was able to perform a cross validation to assess the models accuracy, but I am having trouble understanding the output.

I have 687 rows, I want to train the model on all my data to get the best prediction possible and measure the accuracy on that model. As I understood fbprophet doesen't need to be split into training and test sets.

for the cross validation I set the initial to 500 days (number of days the model gets to learn before making predictions?), the horizon is set to 20 (the number of days to be forecasted after each cutout.), and the period is set to 10 (because I read that this should be half of the horizon).

from fbprophet.diagnostics import cross_validation
df_cv = cross_validation(m, initial='500 days', period='10 days', horizon = '20 days')
df_cv.head()

Output cross_validation

from fbprophet.diagnostics import performance_metrics
df_p = performance_metrics(df_cv)
df_p

Output performance metrics

I have 17 cutoffs with 20 predictions each. what is represented in output 2? because I have 19 lines (from horizon 2 to horizon 20).

I am having trouble with the performance metrics, what exactly is the coverage? what are these values, because the mean absolute error should`t be the same for all 17 forecasts right?

as it is described in Prophet documentation , for the cross validation you have 3 parameters:

  • initial – training period length (training set size for the model)
  • period – spacing between cutoff dates. Cut off points are used to cut the historical data and for each cross-validation fit the model using data only up to cutoff point. You can treat this as the shift size of training period
  • horizon – forecasting period length

By default, the initial training period is set to three times the horizon, and cutoffs are made every half a horizon. The initial period should be long enough to capture all of the components of the model, in particular seasonalities.

Having df_cv = cross_validation(m, initial='500 days', period='10 days', horizon = '20 days') , cross validation will do the following steps:

  1. fit model on initial period (1-500 days)
  2. forecast for the horizon (500-520 days)
  3. cut off period-length days amount from the beginning of the timeseries (10 days, it equals to shifting training period by 10 days)
  4. fit model on initial-size training data but starting from cutoff (10-510 days)
  5. forecast for the horizon (510-530 days)

... and so on until we reach the end of the timeseries. In your case it was 17 iterations.

In the attached Output cross_validation you can see:

  • ds – date for which forecast was made
  • yhat – forecasted value
  • yhat_lower & yhat_upper - uncertainty interval
  • y – actual value
  • cutoff – the date where cutoff was made

In the Output performance metrics you can see metric value for the forecast done for X days to the future. For example in the 1st line (horizon = 2 days) you see what are the metrics for forecast 2 days ahead.

Coverage refers to uncertainty intervals in the trend and observation noise (yhat_lower and yhat_upper estimates).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM