Efficiency and Speed of R code using Rcpp

Question

This post is about speeding up R code using Rcpp package to avoid recursive loops.

My input is define by the following example (length 7) which is part of the data.frame (length 51673) that I used :

S=c(906.65,906.65,906.65,906.65,906.65,906.65,906.65)
T=c(0.1371253,0.1457896,0.1248953,0.1261278,0.1156931,0.0985253,0.1332596)
r=c(0.013975,0.013975,0.013975,0.013975,0.013975,0.013975,0.013975)             
h=c(0.001332596,0.001248470,0.001251458,0.001242143,0.001257921,0.001235755,0.001238440)       
P=c(3,1,5,2,1,4,2)
A= data.frame(S=S,T=T,r=r,h=h,P=P)

         S         T        r           h   Per 
1   906.65 0.1971253 0.013975 0.001332596   3   
2   906.65 0.1971253 0.013975 0.001248470   1   
3   906.65 0.1971253 0.013975 0.001251458   5   
4   906.65 0.1971253 0.013975 0.001242143   2   
5   906.65 0.1971253 0.013975 0.001257921   1   
6   906.65 0.1971253 0.013975 0.001235755   4  
7   906.65 0.1971253 0.013975 0.001238440   2

The parameters are :

w=0.001; b=0.2; a=0.0154; c=0.0000052; neta=-0.70

I have the following code of the function that I want to use :

F<-function(x,w,b,a,c,neta,S,T,r,P){
    u=1i*x 
    nu=(1/(neta^2))*(((1-2*neta)^(1/2))-1)
    # Recursion back to time t
    # Terminal condition for the A and B 
    A_Q=0
    B_Q=0
    steps<-round(T*250,0)  

    for (j in 1:steps){
      A_Q= A_Q+ r*u + w*B_Q-(1/2)*log(1-2*a*(neta^4)*B_Q)
      B_Q= b*B_Q+u*nu+ (1/neta^2)*(1-sqrt((1-2*a*(neta^4)*B_Q)*( 1- 2*c*B_Q - 2*u*neta)))
    }
    F= exp(log(S)*u + A_Q + B_Q*h[P])
  return(F)
}

S = A$S ; r= A$r ; T= A$T ; P=A$P;  h= A$h

Then I want to apply the previous function using my Data.set a the vector of length N= 100000 :

  Z=length(S); N=100000  ; alpha=2 ; delta= 0.25     
  lambda=(2*pi)/(N*delta) 

res = matrix(nrow=N, ncol=Z)
  for (i in 1:N){
    for (j in 1:Z){
      res[i,j]= Re(F(((delta*(i-1))-(alpha+1)*1i),w,b,a,c,neta,S[j],T[j],r[j],P[j]))
    }
  }

But it is taking a lot of time: it takes 20 seconds to execute this line of code for N=100 but I want to execute it for N= 100000 times, the overall run time can take hours. How to fine tune the above code using Rcpp, to reduce the execution time and to obtain an Efficient program?

Is it possible to reduce the execution time and if so, please suggest me a solution even with out Rcpp.

Thanks.

Answer 1

Your function F can be converted to C++ pretty easily by taking advantage of the vec and cx_vec classes in the Armadillo library (accessed through the RcppArmadillo package) - which has great support for vectorized calculations.

#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
arma::cx_vec Fcpp(const arma::cx_vec& x, double w, double b, double a, double c,
                  double neta, const arma::vec& S, const arma::vec& T, 
                  const arma::vec& r, Rcpp::IntegerVector P, Rcpp::NumericVector h) {

  arma::cx_vec u = x * arma::cx_double(0.0,1.0);

  double nu = (1.0/std::pow(neta,2.0)) * (std::sqrt(1.0-2.0*neta)-1.0);
  arma::cx_vec A_Q(r.size());
  arma::cx_vec B_Q(r.size()); 
  arma::vec steps = arma::round(T*250.0);

  for (size_t j = 0; j < steps.size(); j++) {
    for (size_t k = 0; k < steps[j]; k++) {
      A_Q = A_Q + r*u + w*B_Q - 
              0.5*arma::log(1.0 - 2.0*a*std::pow(neta,4.0)*B_Q);
      B_Q = b*B_Q + u*nu + (1.0/std::pow(neta,2.0)) * 
              (1.0 - arma::sqrt((1.0 - 2.0*a*std::pow(neta,4.0)*B_Q) *
                (1.0 - 2.0*c*B_Q - 2.0*u*neta)));
    }
  }

  arma::vec hP = Rcpp::as<arma::vec>(h[P-1]);
  arma::cx_vec F = arma::exp(arma::log(S)*u + A_Q + B_Q*hP);

  return F;
}

Just a couple of minor changes to note:

I'm using arma:: functions for vectorized calculations, such as arma::log , arma::exp , arma::round , arma::sqrt , and various overloaded operators ( * , + , - ); but using std::pow and std::sqrt for scalar calculations. In R, this is abstracted away from us, but here we have to distinguish between the two situations.
Your function F has one loop - for (i in 1:steps) - but the C++ version has two, just due to the differences in loop semantics between the two languages.
Most of the input vectors are arma:: classes (as opposed to using Rcpp::NumericVector and Rcpp::ComplexVector ), the exception being P and h , since Rcpp vectors offer R-like element access - eg h[P-1] . Also notice that P needs to be offset by 1 (0-based indexing in C++), and then converted to an Armadillo vector ( hP ) using Rcpp::as<arma::vec> , since your compiler will complain if you try to multiply a cx_vec with a NumericVector ( B_Q*hP ).
I added a function parameter h - it's not a good idea to rely on the existence of a global variable h , which you were doing in F . If you need to use it in the function body, you should pass it into the function.

I changed the name of your function to Fr , and to make benchmarking a little easier, I just wrapped your double loop that populates the matrix res into the functions Fr and Fcpp :

loop_Fr <- function(mat = res) {
  for (i in 1:N) {
    for (j in 1:Z) {
      mat[i,j]= Re(Fr(((delta*(i-1))-(alpha+1)*1i),w,b,a,c,neta,S[j],T[j],r[j],P[j],h))
    }
  }
  return(mat)
}
loop_Fcpp <- function(mat = res) {
  for (i in 1:N) {
    for (j in 1:Z) {
      mat[i,j]= Re(Fcpp(((delta*(i-1))-(alpha+1)*1i),w,b,a,c,neta,S[j],T[j],r[j],P[j],h))
    }
  }
  return(mat)
}
##
R> all.equal(loop_Fr(),loop_Fcpp())
[1] TRUE

I compared the two functions for N = 100 , N = 1000 , and N = 100000 (which took forever) - adjusting lambda and res accordingly, but keeping everything else the same. Generally speaking, Fcpp is about 10x faster than Fr on my computer:

N <- 100
lambda <- (2*pi)/(N*delta)
res <- matrix(nrow=N, ncol=Z)
##
R> microbenchmark::microbenchmark(loop_Fr(), loop_Fcpp(),times=50L)
Unit: milliseconds
        expr       min        lq    median        uq       max neval
   loop_Fr() 142.44694 146.62848 148.97571 151.86318 186.67296    50
 loop_Fcpp()  14.72357  15.26384  15.58604  15.85076  20.19576    50

N <- 1000
lambda <- (2*pi)/(N*delta) 
res <- matrix(nrow=N, ncol=Z)
##
R> microbenchmark::microbenchmark(loop_Fr(), loop_Fcpp(),times=50L)
Unit: milliseconds
        expr       min        lq    median        uq       max neval
   loop_Fr() 1440.8277 1472.4429 1491.5577 1512.5636 1565.6914    50
 loop_Fcpp()  150.6538  153.2687  155.4156  158.0857  181.8452    50

N <- 100000
lambda <- (2*pi)/(N*delta)
res <- matrix(nrow=N, ncol=Z)
##
R> microbenchmark::microbenchmark(loop_Fr(), loop_Fcpp(),times=2L)
Unit: seconds
        expr       min        lq    median        uq       max neval
   loop_Fr() 150.14978 150.14978 150.33752 150.52526 150.52526     2
 loop_Fcpp()  15.49946  15.49946  15.75321  16.00696  16.00696     2

Other variables, as presented in your question:

S <- c(906.65,906.65,906.65,906.65,906.65,906.65,906.65)
T <- c(0.1371253,0.1457896,0.1248953,0.1261278,0.1156931,0.0985253,0.1332596)
r <- c(0.013975,0.013975,0.013975,0.013975,0.013975,0.013975,0.013975)             
h <- c(0.001332596,0.001248470,0.001251458,0.001242143,0.001257921,0.001235755,0.001238440)       
P <- c(3,1,5,2,1,4,2)
w <- 0.001; b <- 0.2; a <- 0.0154; c <- 0.0000052; neta <- (-0.70)
Z <- length(S)
alpha <- 2; delta <- 0.25

Efficiency and Speed of R code using Rcpp

Question

1 answers

solution1
3 2015-04-18 21:32:03

Efficiency and Speed of R code using Rcpp

Question

1 answers

solution1 3 2015-04-18 21:32:03

solution1
3 2015-04-18 21:32:03