简体   繁体   English

RCPP-从自定义分布生成多个随机观测值

[英]Rcpp - generate multiple random observations from custom distribution

This question is related to a previous one on calling functions within functions in Rcpp. 这个问题与上一个有关在Rcpp中调用函数的问题有关。

I need to generate a large number of random draws from a custom distribution, in a way similar to rnorm() or rbinom(), with the additional complication that my function produces a vector output. 我需要以类似于rnorm()或rbinom()的方式从自定义发行版中生成大量随机绘图,另外还有一个复杂之处,就是我的函数会产生矢量输出。

As a solution, I thought about defining a function that generates observations from the custom distribution, and then a main function that draws n times from the generating function via a for loop. 作为解决方案,我考虑过定义一个从自定义分布生成观察结果的函数,然后定义一个主要函数,该函数通过for循环从生成函数中绘制n次。 Here below is a much simplified working version of the code: 以下是该代码的简化版本:

#include <Rcpp.h>
using namespace Rcpp;

// generating function
NumericVector gen(NumericVector A, NumericVector B){
  NumericVector out = no_init_vector(2); 
  out[0] = R::runif(A[0],A[1]) + R::runif(B[0],B[1]);
  out[1] = R::runif(A[0],A[1]) - R::runif(B[0],B[1]);
  return out;
}

// [[Rcpp::export]]
// draw n observations
NumericVector rdraw(int n, NumericVector A, NumericVector B){
  NumericMatrix out = no_init_matrix(n, 2);
  for (int i = 0; i < n; ++i) {
    out(i,_) = gen(A, B); 
  }
  return out;
}

I am looking for ways to speed up the draws. 我正在寻找加快抽奖的方法。 My questions are: is there any more efficient alternative to the for loop? 我的问题是:for循环还有其他更有效的替代方法吗? Would parallelization help in this case? 在这种情况下,并行化会有所帮助吗?

Thank you for any help! 感谢您的任何帮助!

There are different ways to speed this up: 有多种方法可以加快速度:

  1. Use inline on gen() , reducing the number of function calls. gen()上使用inline ,减少函数调用的次数。
  2. Use Rcpp::runif instead of a loop with R::runif to remove even more function calls. 使用Rcpp::runif而不是带有R::runif的循环来删除更多函数调用。
  3. Use a faster RNG that allows for parallel execution. 使用更快的RNG,以允许并行执行。

Here points 1. and 2.: 这里指向1.和2 .:

#include <Rcpp.h>
using namespace Rcpp;

// generating function
inline NumericVector gen(NumericVector A, NumericVector B){
  NumericVector out = no_init_vector(2); 
  out[0] = R::runif(A[0],A[1]) + R::runif(B[0],B[1]);
  out[1] = R::runif(A[0],A[1]) - R::runif(B[0],B[1]);
  return out;
}

// [[Rcpp::export]]
// draw n observations
NumericVector rdraw(int n, NumericVector A, NumericVector B){
  NumericMatrix out = no_init_matrix(n, 2);
  for (int i = 0; i < n; ++i) {
    out(i,_) = gen(A, B); 
  }
  return out;
}

// [[Rcpp::export]]
// draw n observations
NumericVector rdraw2(int n, NumericVector A, NumericVector B){
  NumericMatrix out = no_init_matrix(n, 2);
  out(_, 0) = Rcpp::runif(n, A[0],A[1]) + Rcpp::runif(n, B[0],B[1]);
  out(_, 1) = Rcpp::runif(n, A[0],A[1]) - Rcpp::runif(n, B[0],B[1]);
  return out;
}

/*** R
set.seed(42)
system.time(rdraw(1e7, c(0,2), c(1,3)))
system.time(rdraw2(1e7, c(0,2), c(1,3)))
*/

Result: 结果:

> set.seed(42)

> system.time(rdraw(1e7, c(0,2), c(1,3)))
   user  system elapsed 
  1.576   0.034   1.610 

> system.time(rdraw2(1e7, c(0,2), c(1,3)))
   user  system elapsed 
  0.458   0.139   0.598 

For comparison, your original code took about 1.8s for 10^7 draws. 为了进行比较,您的原始代码在10 ^ 7抽奖中花费了1.8秒。 For point 3. I am adapting code from the parallel vignette of my dqrng package: 对于第3点,我正在从dqrng包的并行小插图改编代码:

#include <Rcpp.h>
// [[Rcpp::depends(dqrng)]]
#include <xoshiro.h>
#include <dqrng_distribution.h>
// [[Rcpp::plugins(openmp)]]
#include <omp.h>
// [[Rcpp::depends(RcppParallel)]]
#include <RcppParallel.h>
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::export]]
Rcpp::NumericMatrix rdraw3(int n, Rcpp::NumericVector A, Rcpp::NumericVector B, int seed, int ncores) {
  dqrng::uniform_distribution distA(A(0), A(1));
  dqrng::uniform_distribution distB(B(0), B(1));
  dqrng::xoshiro256plus rng(seed);
  Rcpp::NumericMatrix res = Rcpp::no_init_matrix(n, 2);
  RcppParallel::RMatrix<double> output(res);

  #pragma omp parallel num_threads(ncores)
  {
  dqrng::xoshiro256plus lrng(rng);      // make thread local copy of rng 
  lrng.jump(omp_get_thread_num() + 1);  // advance rng by 1 ... ncores jumps 
  auto genA = std::bind(distA, std::ref(lrng));
  auto genB = std::bind(distB, std::ref(lrng));      

  #pragma omp for
  for (int i = 0; i < n; ++i) {
    output(i, 0) = genA() + genB();
    output(i, 1) = genA() - genB();
  }
  }
  return res;
}

/*** R
system.time(rdraw3(1e7, c(0,2), c(1,3), 42, 2))
*/

Result: 结果:

> system.time(rdraw3(1e7, c(0,2), c(1,3), 42, 2))
   user  system elapsed 
  0.276   0.025   0.151 

So with a faster RNG and moderate parallelism, we can gain an order of magnitude in execution time. 因此,使用更快的RNG和适度的并行性,我们可以在执行时间上获得一个数量级。 The results will be different, of course, but summary statistics should be the same. 结果当然会有所不同,但是摘要统计信息应该相同。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM