简体   繁体   中英

How to forecast the future sales for an entire database in R with the asocciated error?

I am doing my first internship in a local company, and they gave to me the mission of predict the demand of a certain stock of products for the next three semesters (18 months). First I did it in Excel with the MA method with not the best results, so now I´m trying in R and I´m stuck in the next problem. My data is an excel with 15.000 columns, the first row is the name of the product, and the next 48 rows are the sales(numbers) for each product, from January 2017 to December 2020.

I want to apply the auto.arima function and other forecasting methods/function from R, and I need to do it for all the 15000 products at once, for each method. I know how to do it for 1 column, but I am not very skilled in R so I am having problems trying to programming it for all 15.000 columns at once. Also, I will need a way to find/show the asociated error for each method directly in excel, because I would like to choose the one with the minimal error between different methods Moving Average, Holt-Winters, etc. Currently I have this, that gives to me a table with the Forecast for one individual product(one column) with auto.arima method.

data <- read_excel("aceiteX.xlsx")
Y <- ts(data[,1], start = c(1), frequency=12)
modelo_arima <- auto.arima(Y, d=1, D=1, stepwise = FALSE, approximation = FALSE, trace = TRUE)
fcast <- forecast(modelo_arima, h= 19, level=c(95)) 
pronostico <- as.data.frame(fcast)
write.table(pronostico, file= "C:\\Users\\bro\\Documentos\\aceiteX.csv",sep=",")

My idea to the do the next step was to use the lapply function to do it for all products, apparently working, but it wasnt what I need it. I want a table that shows the monthly forecast for each product from January 2021 to June 2022. And then, maybe the last row, or a different column, can show the asocciated error for the method, because I would like to try many methods (auto.arima, Holt-Winters, etc) and choose the best one.

If I can get a table like that, the other part of the job could be easily finished in Excel. Any advice, tip or secret function would be trully apreciated. So my question is, in abstract, how to apply a forecasting function to many columns at once, and how to print the asocciated error of the method for each product in an Excel document. Thank you so much!

Well, I believe your problem here is not a code problem, but instead a general software architecture or solution style problem.

It doesn't matter whether you use lapply or a for loop or whatever. The matter here is you need to find a way to access and identify every product (well, think of every product as a unique time series.) data, When you're done with that. you can split each time series in training data and test data, Only then you will be generating forecasts associated with each training data. finally comparing the results with the test data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM