简体   繁体   中英

Batch Forecasting; using apply() function instead of for loop. apply() function gives different point forecast

So far I was using this method from professor Hyndman when I had multiple time series to forecast. But when I have a large number of ts it is fairly slow.

Now I am trying to use apply() function as follows

library(forecast)

fc_func <- function(y){
  forecast(auto.arima(y),h=12)$mean
}

retail <- read.csv("https://robjhyndman.com/data/ausretail.csv",header=FALSE)
retail <- ts(retail[,-1],f=12,s=1982+3/12)

  frc<- apply(retail,2 ,fc_func)

It seem that it is working well but when I use for loop as following:

ns <- ncol(retail)
h <- 12
fcast <- matrix(NA,nrow=h,ncol=ns)
for(i in 1:ns){
  fcast[,i] <- forecast(auto.arima(retail[,i]),h=h)$mean
}

I get different point forecast. What is the reason?

Edit: I fixed it by changing the "fc_func" function. Now it returns the same results as for loop but now it is also as slow as for loop

fc_func <- function(x){

ts(x,f=12,s=1982+3/12)->y

 forecast(auto.arima(y),h=12)$mean
}

retail <- read.csv("https://robjhyndman.com/data/ausretail.csv",header=FALSE)
retail <- ts(retail[,-1],f=12,s=1982+3/12)

  frc<- apply(retail,2 ,fc_func)

For debugging i've added some prints in the apply. The interesting one is the class(y)

library(forecast)

fc_func <- function(y){
  print(length(y))
  print(class(y))
  #print(y)
  forecast(auto.arima(y),h=12)$mean
}

retail <- read.csv("https://robjhyndman.com/data/ausretail.csv",header=FALSE)
retail <- ts(retail[,-1],f=12,s=1982+3/12)

retail2 = retail

#retail = retail2[1:333,1:42]

frc<- apply(retail,2 ,fc_func)

All the y arrive as numeric at apply.

> frc<- apply(retail,2 ,fc_func)
[1] 333
[1] "numeric"
[1] 333
[1] "numeric"
[1] 333
[1] "numeric"
[1] 333
[1] "numeric"
[1] 333

This is different in the for-loop:

ns <- ncol(retail)
h <- 12
fcast1 <- matrix(NA,nrow=h,ncol=ns)
for(i in 1:ns){
  print(length(retail[,i]))
  print(class(retail[,i]))
  #print(retail[,i])
  fcast1[,i] <- forecast(auto.arima(retail[,i]),h=h)$mean
}

here the variables are delivered as ts to auto.arima.

> for(i in 1:ns){
+   print(length(retail[,i]))
+   print(class(retail[,i]))
+   #print(retail[,i])
+   fcast1[,i] <- forecast(auto.arima(retail[,i]),h=h)$mean
+ }
[1] 333
[1] "ts"
[1] 333
[1] "ts"
[1] 333
[1] "ts"
[1] 333

I guess this causes the differences, because when i reduce retail to a simple matrix by

retail = retail[1:NROW(retail), 1:NCOL(retail)] 

and run the for-loop again i get perfectly the same results as in the apply version.

all.equal(frc, fcast1)

So i guess you have to transform the variables to ts within the the fc_func again before sending them into the forecast function.

As a workaround (and because i had no idea how to transform y into the desired ts object) you could use an sapply version:

fc_func2 <- function(y){

  forecast(auto.arima(retail[,y]),h=12)$mean
}

frc2 <- sapply(1:NCOL(retail), fc_func2)

It should give the desired values, but im not sure if it is any faster than the loop-version.

The issue is apply() manipulating the class of the time series object, retail . Being the rudimentary version of the apply family, apply() is best used for simple matrix objects. It will cast its input to a matrix object with as.matrix() when called and hence why apply() is often warned not to be used for data frames.

Per the ?apply docs:

If X is not an array but an object of a class with a non-null dim value (such as a data frame), apply attempts to coerce it to an array via as.matrix if it is two-dimensional (eg, a data frame) or via as.array

So apply does not preserve the class object of its input before being processed into fc_func :

class(retail)
# [1] "mts"    "ts"     "matrix" 

One can see this when using sapply which runs just as slow as for and in removing dimnames returns exactly as for loop:

# LOOP VERSION
ns <- ncol(retail)
h <- 12
fcast1 <- matrix(NA,nrow=h,ncol=ns)

for(i in 1:ns) {
  fcast1[,i] <- forecast(auto.arima(retail[,i]), h=h)$mean
}

# SAPPLY VERSION
frc_test <- sapply(retail, fc_func, USE.NAMES = FALSE)
dimnames(frc_test) <- NULL

identical(frc_test, fcast1)
# [1] TRUE

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM