The following code runs a loops but the problem is the speed; it takes several hours to finish and I am looking for an alternative so that I don´t have to wait so long.
Basically what the code does the follolling calculations:
1.-It calculates the mean of the values of the 60 days.
2.-It gets the standard deviation of the values of the 60 days.
3.-It gets the Max of the values of the 60 days.
4.-It gets the Min of the values of the 60 days.
5.-Then with the previous calculations the code "smooths" the peaks up and down.
6.-Then the code simply get the means from 60, 30, 15 and 7 Days.
So the purpose of these code is to remove the peaks of the data using the method already mentioned.
Here is the code:
options(stringsAsFactors=F)
DAT <- data.frame(ITEM = "x", CLIENT = as.numeric(1:100000), matrix(sample(1:1000, 60, replace=T), ncol=60, nrow=100000, dimnames=list(NULL,paste0('DAY_',1:60))))
DATT <- DAT
nRow <- nrow(DAT)
TMP <- NULL
for(iROW in 1:nRow){#iROW <- 1
print(c(iROW,nRow))
Demand <- NULL
for(iCOL in 3:ncol(DAT)){#iCOL <- 1
Demand <- c(Demand,DAT[iROW,iCOL])
}
ww <- which(!is.na(Demand))
if(length(ww) > 0){
Average <- round(mean(Demand[ww]),digits=4)
DesvEst <- round(sd(Demand,na.rm=T),digits=4)
Max <- round(Average + (1 * DesvEst),digits=4)
Min <- round(max(Average - (1 * DesvEst), 0),digits=4)
Demand <- round(ifelse(is.na(Demand), Demand, ifelse(Demand > Max, Max, ifelse(Demand < Min, Min, Demand))))
Prom60 <- round(mean(Demand[ww]),digits=4)
Prom30 <- round(mean(Demand[intersect(ww,(length(Demand) - 29):length(Demand))]),digits=4)
Prom15 <- round(mean(Demand[intersect(ww,(length(Demand) - 14):length(Demand))]),digits=4)
Prom07 <- round(mean(Demand[intersect(ww,(length(Demand) - 6):length(Demand))]),digits=4)
}else{
Average <- DesvEst <- Max <- Min <- Prom60 <- Prom30 <- Prom15 <- Prom07 <- NA
}
DAT[iROW,3:ncol(DAT)] <- Demand
TMP <- rbind(TMP, cbind(DAT[iROW,], Average, DesvEst, Max, Min, Prom60, Prom30, Prom15, Prom07))
}
DAT <- TMP
If one runs your code (with smaller number of rows) through a profiler, one sees that the main issue is the rbind
in the end, followed by the c
mentioned by @Riverarodrigoa:
We can focus on these two by creating numeric matrices of suitable size and working with those. Only in the end the final data.frame
is created:
options(stringsAsFactors=F)
N <- 1000
set.seed(42)
DAT <- data.frame(ITEM = "x",
CLIENT = as.numeric(1:N),
matrix(sample(1:1000, 60, replace=T), ncol=60, nrow=N, dimnames=list(NULL,paste0('DAY_',1:60))))
nRow <- nrow(DAT)
TMP <- matrix(0, ncol = 8, nrow = N,
dimnames = list(NULL, c("Average", "DesvEst", "Max", "Min", "Prom60", "Prom30", "Prom15", "Prom07")))
DemandMat <- as.matrix(DAT[,3:ncol(DAT)])
for(iROW in 1:nRow){
Demand <- DemandMat[iROW, ]
ww <- which(!is.na(Demand))
if(length(ww) > 0){
Average <- round(mean(Demand[ww]),digits=4)
DesvEst <- round(sd(Demand,na.rm=T),digits=4)
Max <- round(Average + (1 * DesvEst),digits=4)
Min <- round(max(Average - (1 * DesvEst), 0),digits=4)
Demand <- round(ifelse(is.na(Demand), Demand, ifelse(Demand > Max, Max, ifelse(Demand < Min, Min, Demand))))
Prom60 <- round(mean(Demand[ww]),digits=4)
Prom30 <- round(mean(Demand[intersect(ww,(length(Demand) - 29):length(Demand))]),digits=4)
Prom15 <- round(mean(Demand[intersect(ww,(length(Demand) - 14):length(Demand))]),digits=4)
Prom07 <- round(mean(Demand[intersect(ww,(length(Demand) - 6):length(Demand))]),digits=4)
}else{
Average <- DesvEst <- Max <- Min <- Prom60 <- Prom30 <- Prom15 <- Prom07 <- NA
}
DemandMat[iROW, ] <- Demand
TMP[iROW, ] <- c(Average, DesvEst, Max, Min, Prom60, Prom30, Prom15, Prom07)
}
DAT <- cbind(DAT[,1:2], DemandMat, TMP)
For 1000 rows this takes about 0.2 s instead of over 4 s. For 10.000 rows I get 2 s instead of 120 s.
Obviously, this is not really pretty code. One could do this much nicer using tidyverse
or data.table
. I just find it worth noting that for
loops are not necessarily slow in R. But dynamically growing data structures is.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.