Efficiency of Rcpp sample v. C++ shuffle

Question

I'm trying to optimize an algorithm for R. Initially, I wrote the algorithm using Rcpp (and Rcpp vectors, etc.) but subsequently rewrote it using standard C++ vectors and only translating it to Rcpp at the final stage. However, the component of the C++ algorithm that involves shuffle seems quite slow. In fact, translating back and forth to an Rcpp vector so that I can use the Rcpp/R sample function is much faster. This surprises me.

Here's a minimally reproducible example:

#include <Rcpp.h>
#include <random>
#include <algorithm>

// [[Rcpp::export]]

List test_cpp(int n, int x)  {

  List return_list(n);

  std::vector<int> v;
  v.reserve(x);

  for(int i = 0; i < x; ++i) {
    v.push_back(i);
  }

  std::random_device rd;
  std::mt19937 g(rd());

  for(int i = 0; i < n; ++i)  {
    std::shuffle(v.begin(), v.end(), g);
    return_list(i) = v;
  }

  return return_list;
}


// [[Rcpp::export]]

List test_r(int n,
            int x)  {

  List return_list(n);

  std::vector<int> v;
  v.reserve(x);

  for(int i = 0; i < x; ++i){
      v.push_back(i);
    }

  IntegerVector vs = wrap(v);

  for(int i = 0; i < n; ++i)  {
    IntegerVector s_v = sample(vs, v.size());
    std::vector<int> s_v_c = as<std::vector<int>>(s_v);
    return_list(i) = s_v_c;
  }

  return return_list;
}

The first function using C++ shuffle is significantly slower than the version using Rcpp sample until you're shuffling a vector of ~50,000 elements. For an example closer to most of my use cases, the following produces median times of ~13 ms for the Rcpp sample v. ~20 ms for C++ shuffle .

n <- 1000
x <- 999

speed <- bench::mark(min_iterations = 100, 
                       check = FALSE,
                       cpp = test_cpp(n, x),
                       rcpp = test_r(n, x)
                       )

  ggplot2::autoplot(speed) +
    ggplot2::theme_minimal() +
    ggplot2::xlab(NULL) +
    ggplot2::ylab(NULL)

It's likely that I've mucked up the C++ code. If so could someone show me my mistake? Or is it that shuffle is just slow and I should use a different C++ algorithm? Or is there some penalty in calling an algorithm/random number generator outside of R/Rcpp that explains this difference in performance? Thankful for any suggestions.

Edit To illustrate that the inefficiency for the C++ versions doesn't come from having to convert standard vectors to IntegerVectors, I've modified the Rcpp version so that after sampling IntegerVectors are superfluously converted to standard vectors (and then back to IntegerVectors).

Update

I've experimented a bit with alternative pseudo random number generators. This post suggest that the Mersenne Twister pseudo random number generator I use above is relatively slow compared to some alternatives. I tried the pseudo random number generators coded in this post and they are indeed faster but they don't massively improve performance. Here are my simplified test functions.

// [[Rcpp::export]]

void test_pcg(int x)  {
  std::vector<int> v;   
  v.reserve(x);
  for(int i = 0; i < x; ++i) {
    v.push_back(i);
  }
  std::random_device rd;   
  pcg g(rd);
  std::shuffle(v.begin(), v.end(), g);
}


  // [[Rcpp::export]]

  void test_mt(int x)  {
    std::vector<int> v;
    v.reserve(x);
    for(int i = 0; i < x; ++i) {
      v.push_back(i);
    }
    std::random_device rd;
    std::mt19937 g(rd());
    std::shuffle(v.begin(), v.end(), g);
  }


// [[Rcpp::export]]

void test_splitmix(int x)  {
  std::vector<int> v;   
  v.reserve(x);
  for(int i = 0; i < x; ++i) {
    v.push_back(i);
  }
  std::random_device rd;   
  splitmix g(rd);   
  std::shuffle(v.begin(), v.end(), g);
}



// [[Rcpp::export]]

void test_xorshift(int x)  {
  std::vector<int> v;   
  v.reserve(x);
  for(int i = 0; i < x; ++i) {
    v.push_back(i);
  }
  std::random_device rd;   
  xorshift g(rd);
  std::shuffle(v.begin(), v.end(), g);
}


// [[Rcpp::export]]

void test_rcpp(int x)  {
  IntegerVector v = seq(0, x);   
  IntegerVector s_v = sample(v, x);
}

For a vector of 1,000, the Rcpp version is still massively faster, ~13 ms compared to 20 ms for the fastest RNG's with C++ shuffle.

From what I understand, C++ shuffle implements the Fisher - Yates (Knuth) shuffle. My conjecture now is that the Rcpp sample function doesn't implement the Fisher-Yates shuffle when all elements are sampled without replacement but instead utilizes a sorting algorithm? Perhaps there's a similar algorithm in C++ that would be faster than shuffle for my application?

Answer 1

As alluded to in my comment, your functions may 'do too much'. Here is simplified example (which is also nonsensical as we likely alter the input vector each time) but it distills your question down to 'is sample faster than shuffle from the standard library'. And it is not.

My modified code follows below.

Code

#include <Rcpp.h>
#include <random>
#include <algorithm>

// [[Rcpp::export]]
Rcpp::IntegerVector shuffle_cpp(Rcpp::IntegerVector x)  {
    std::random_device rd;
    std::mt19937 g(rd());
    std::shuffle(x.begin(), x.end(), g);
    return x;
}

// [[Rcpp::export]]
Rcpp::IntegerVector sample_rcpp(Rcpp::IntegerVector x)  {
    return sample(x, x.size());
}

/*** R
v <- seq(1, 1e6)
res <- bench::mark(min_iterations = 100, check = FALSE, shuffle_cpp(v), sample_rcpp(v))
res
ggplot2::autoplot(res) + ggplot2::theme_minimal() + ggplot2::ylab(NULL)
*/

Efficiency of Rcpp sample v. C++ shuffle

Question

1 answers

solution1
4 ACCPTED 2022-05-20 12:39:56

Code

Efficiency of Rcpp sample v. C++ shuffle

Question

1 answers

solution1 4 ACCPTED 2022-05-20 12:39:56

Code

solution1
4 ACCPTED 2022-05-20 12:39:56