简体   繁体   English

在 R 中并行化非平凡的 Gibbs 采样器:RcppThread 与 RcppParallel

[英]Parallelizing a non-trivial Gibbs Sampler in R: RcppThread vs. RcppParallel

Overview : I'm interested in parallelizing (across chains) a Gibbs sampler for a non-trivial regression problem that I've already implemented in serial via Rcpp/RcppEigen.概述:我对并行化(跨链)一个非平凡回归问题的 Gibbs 采样器很感兴趣,我已经通过 Rcpp/RcppEigen 串行实现了这个问题。 I've read the documentation for RcppParallel and RcppThread and I want to know if my understanding of the challenges involved in parallelizing this code are accurate and if my proposed pseudocode using RcppThread is viable.我已经阅读了RcppParallelRcppThread的文档,我想知道我对并行化此代码所涉及的挑战的理解是否准确,以及我提出的使用RcppThread伪代码是否可行。

Programming Challenge : This regression problem requires inverting an updated design matrix each iteration of the Gibbs sampler.编程挑战:这个回归问题需要在 Gibbs 采样器的每次迭代中反转更新的设计矩阵。 Consequently any new matrix (one per chain) needs to be "thread safe".因此,任何新矩阵(每个链一个)都需要是“线程安全的”。 That is, there is no danger of one thread writing to memory that another thread might also try to access.也就是说,不存在一个线程写入另一个线程也可能尝试访问的内存的危险。 If this is done, I can then draw and store the regression coefficient samples (beta) by giving the Rcpp::parallelFor a unique index with which to assign the samples.如果这样做,然后我可以通过给Rcpp::parallelFor一个唯一索引来绘制和存储回归系数样本(beta),用于分配样本。 I'm wondering where/how would be best to initialize these thread specific matrices ?.我想知道在哪里/如何最好初始化这些线程特定的矩阵 See below for my overall conceptual understanding and first guess at how I could essentially use the sample principle of assigning samples in parallel, to assign X's in parallel.请参阅下文以了解我的整体概念理解,并首先猜测我如何基本上使用并行分配样本的样本原则来并行分配 X。 Note This is assuming that Eigen objects are okay with concurrent index access in the same way I've seen std::vector<>'s memory accessed in the RcppThread documentation.注意这是假设 Eigen 对象可以进行并发索引访问,就像我在RcppThread文档中看到 std::vector<> 的内存访问RcppThread

#include "RcppEigen.h>
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::depends(RcppThread)]] 
// [[Rcpp::depends(RcppEigen)]] 

// Sampler class definition
#include "Sampler.h" 
#include "RcppThread.h"

// [[Rcpp::export]]
Eigen::ArrayXXd fancyregression(const Eigen::VectorXd &y, // outcome vector
                                const Eigen::MatrixXd &Z, // static sub-matrix of X
                                const int &num_iterations,
                                const int &num_chains_less_one,
                                const int &seed,
                                ...)
{ 
   std::mt19937 rng;
   rng(seed);
   const int dim_X = get_dim_X(Z,...);
   const int n = y.rows();
   const int num_chains = num_chains_less_one + 1;

   Eigen::ArrayXXd beta_samples;
   beta_samples.setZero(num_iterations,num_chains*dim_X);

   Eigen::MatrixXd shared_X(n,dim_X*num_chains);

   // sampler object only has read access to its arguments
   SamplerClass sampler(y,Z,...);
    
   //chain for loop
    RcppThread::parallelFor(0, num_chains_less_one,[&beta, &shared_X, &n,&sampler, &dim_X, &rng](unsigned int chain){
        // chain specific iteration for loop
        for(unsigned int iter_ix = 0; iter_ix < num_iterations ; iter_ix ++){
            X.block(0,dim_X*chain,n,dim_X) = sampler.create_X(rng);
            beta_samples(iter_ix,dim_X*chain) = sampler.get_beta_sample(X,rng); 
        }
    });

    return(beta_samples);

}

"where/how would be best to initialize these thread specific matrices?" “在哪里/如何最好初始化这些线程特定的矩阵?”

You're looking for thread specific resources.您正在寻找线程特定的资源。 Here's a barebones example:这是一个准系统示例:

#include <Rcpp.h>
#include <RcppParallel.h>
using namespace Rcpp;
using namespace RcppParallel;

// [[Rcpp::depends(RcppParallel)]]
// [[Rcpp::plugins(cpp11)]]

struct Test : public Worker {
  tbb::enumerable_thread_specific<bool> printonce;
  Test() : printonce(false) {}
  
  void operator()(std::size_t begin, std::size_t end) {
    tbb::enumerable_thread_specific<bool>::reference p = printonce.local();
    if(!p) { // print once per thread
      std::cout << 1;
      p= true;
    }
  }
};

// [[Rcpp::export(rng = false)]]
void test() {
  Test x{};
  parallelFor(0, 10000, x);
}

RcppParallel uses TBB under the hood (for the majority of operating systems) so you can use and look up anything in TBB. RcppParallel 在底层使用 TBB(适用于大多数操作系统),因此您可以在 TBB 中使用和查找任何内容。

Note that since it's a thread local resource has to be assigned somewhere, you'll want to use a class/functor rather than a lambda.请注意,由于它是必须在某处分配的线程本地资源,因此您需要使用类/函子而不是 lambda。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM