I'm trying to speed up code that takes time series data and limits it to a maximum value and then stretches it forward until sum of original data and the "stretched" data are the same.
I have a more complicated version of this that is taking 6 hours to run on 100k rows. I don't think this is vectorizable because it uses values calculated on prior rows - is that correct?
x <- c(0,2101,3389,3200,1640,0,0,0,0,0,0,0)
dat <- data.frame(x=x,y=rep(0,length(x)))
remainder <- 0
upperlimit <- 2000
for(i in 1:length(dat$x)){
if(dat$x[i] >= upperlimit){
dat$y[i] <- upperlimit
} else {
dat$y[i] <- min(remainder,upperlimit)
}
remainder <- remainder + dat$x[i] - dat$y[i]
}
dat
I understand you can use ifelse
but I don't think cumsum
can be used to carry forward the remainder - apply
doesn't help either as far as I know. Do I need to resort to Rcpp
? Thank you greatly.
I went ahead and implemented this in Rcpp
and made some adjustments to the R
function:
require(Rcpp);require(microbenchmark);require(ggplot2);
limitstretchR <- function(upperlimit,original) {
remainder <- 0
out <- vector(length=length(original))
for(i in 1:length(original)){
if(original[i] >= upperlimit){
out[i] <- upperlimit
} else {
out[i] <- min(remainder,upperlimit)
}
remainder <- remainder + original[i] - out[i]
}
out
}
The Rcpp
function:
cppFunction('
NumericVector limitstretchC(double upperlimit, NumericVector original) {
int n = original.size();
double remainder = 0.0;
NumericVector out(n);
for(int i = 0; i < n; ++i) {
if (original[i] >= upperlimit) {
out[i] = upperlimit;
} else {
out[i] = std::min<double>(remainder,upperlimit);
}
remainder = remainder + original[i] - out[i];
}
return out;
}
')
Testing them:
x <- c(0,2101,3389,3200,1640,0,0,0,0,0,0,0)
original <- rep(x,20000)
upperlimit <- 2000
system.time(limitstretchR(upperlimit,original))
system.time(limitstretchC(upperlimit,original))
That yielded 80.655 and 0.001 seconds respectively. Native R
is quite bad for this. However, I ran a microbenchmark
(using a smaller vector) and got some confusing results.
res <- microbenchmark(list=
list(limitstretchR=limitstretchR(upperlimit,rep(x,10000)),
limitstretchC=limitstretchC(upperlimit,rep(x,10000))),
times=110,
control=list(order="random",warmup=10))
print(qplot(y=time, data=res, colour=expr) + scale_y_log10())
boxplot(res)
print(res)
If you were to run that you would see nearly identical results for both functions. This is my first time using microbenchmark
, any tips?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.