简体   繁体   English

在 Rcpp 中使用 sample()

[英]Using sample() from within Rcpp

I have a matrix containing probabilities, with each of the four columns corresponding to a score (an integer in sequence from 0 to 4).我有一个包含概率的矩阵,四列中的每一列对应一个分数(从 0 到 4 的整数)。 I want to sample a single score for each row using the probabilities contained in that row as sampling weights.我想使用该行中包含的概率作为采样权重为每一行采样一个分数。 In rows where some columns do not contain probabilities (NAs instead), the sampling frame is limited to the columns (and their corresponding scores) which do (eg for a row with 0.45,0.55,NA,NA, either 0 or 1 would be sampled).在某些列不包含概率(取而代之的是 NA)的行中,抽样框架仅限于包含概率的列(及其相应的分数)(例如,对于具有 0.45、0.55、NA、NA 的行,0 或 1 将是采样)。 However, I get this error (followed by several others), so how can I make it work?:但是,我收到了这个错误(后面还有其他几个),那么我怎样才能让它工作呢?:

error: no matching function for call to 'as<Rcpp::IntegerVector>(Rcpp::Matrix<14>::Sub&)'
     score[i] = sample(scrs,1,true,as<IntegerVector>(probs));

Existing answers suggest RcppArmadillo is the solution but I can't get that to work either.现有答案表明 RcppArmadillo 是解决方案,但我也无法让它发挥作用。 If I add: require(RcppArmadillo) before the cppFunction and score[i] = Rcpp::RcppArmadillo::sample(scrs,1,true,probs);如果我在 cppFunction 和score[i] = Rcpp::RcppArmadillo::sample(scrs,1,true,probs);之前添加: require(RcppArmadillo) ); in place of the existing sample() statement, I get:代替现有的 sample() 语句,我得到:

error: 'Rcpp::RcppArmadillo' has not been declared
     score[i] = Rcpp::RcppArmadillo::sample(scrs,1,true,probs);

Or if I also include, #include <RcppArmadilloExtensions/sample.h> at the top, I get:或者,如果我还包括#include <RcppArmadilloExtensions/sample.h>在顶部,我得到:

fatal error: RcppArmadilloExtensions/sample.h: No such file or directory
   #include <RcppArmadilloExtensions/sample.h>

Reproducible code:可重现的代码:

p.vals <- matrix(c(0.44892077,0.55107923,NA,NA,
                 0.37111195,0.62888805,NA,NA,
                 0.04461714,0.47764478,0.303590351,1.741477e-01,
                 0.91741642,0.07968127,0.002826406,7.589714e-05,
                 0.69330800,0.24355559,0.058340934,4.795468e-03,
                 0.43516823,0.43483784,0.120895859,9.098067e-03,
                 0.73680809,0.22595438,0.037237525,NA,
                 0.89569365,0.10142719,0.002879163,NA),nrow=8,ncol=4,byrow=TRUE)

step.vals <- c(1,1,3,3,3,3,2,2)

require(Rcpp)
cppFunction('IntegerVector scores_cpp(NumericMatrix p, IntegerVector steps){

  int prows = p.nrow();

  IntegerVector score(prows);
  
  for(int i=0;i<prows;i++){
    int step = steps[i];
    
    IntegerVector scrs = seq(0,step);
    
    int start = 0;
    int end = step;
    
    NumericMatrix::Sub probs = p(Range(i,i),Range(start,end));

    score[i] = sample(scrs,1,true,probs);
  }
  
  return score;
  
}')

test <- scores_cpp(p.vals,step.vals)
test

Note: the value of step.vals for each row is always equal to the number of columns containing probabilities in that row -1.注意:每行的 step.vals 的值始终等于该行中包含概率的列数 -1。 So passing the step.values to the function may be extraneous.所以将 step.values 传递给函数可能是多余的。

You may be having a 'forest for the trees' moment here.您可能会在这里拥有“以树换林”的时刻。 The RcppArmadillo unit tests actually provide a working example. RcppArmadillo单元测试实际上提供了一个工作示例。 If you look at the source file inst/tinytest/test_sample.R , it has a simple如果您查看源文件inst/tinytest/test_sample.R ,它有一个简单的

Rcpp::sourceCpp("cpp/sample.cpp")

and in the that file inst/tinytest/cpp/sample.cpp we have the standard那个文件 inst/tinytest/cpp/sample.cpp我们有标准

// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>

#include <RcppArmadilloExtensions/sample.h>

to a) tell R to look at RcppArmadillo header directories and b) include the sampler extensions. a) 告诉 R 查看RcppArmadillo头目录和 b) 包括采样器扩展。 This is how it works, and this has been documented to work for probably close to a decade.这就是它的工作方式,并且已被证明可以工作近十年。

As an example I can just do (in my $HOME directory containing git/rcpparmadillo )作为一个例子,我可以做(在我的$HOME包含git/rcpparmadillo的目录中)

> Rcpp::sourceCpp("git/rcpparmadillo/inst/tinytest/cpp/sample.cpp")
> set.seed(123)
> csample_integer(1:5, 10, TRUE, c(0.4, 0.3, 0.2, 0.05, 0.05))
 [1] 1 3 2 3 4 1 2 3 2 2
> 

The later Rcpp addition works the same way, but I find working with parts of matrices to be more expressive and convenient with RcppArmadillo.后来的 Rcpp 添加的工作方式相同,但我发现使用 RcppArmadillo 处理矩阵的部分内容更具表现力和方便性。

Edit: Even simpler for anybody with the RcppArmadillo package installed:编辑:对于安装了RcppArmadillo软件包的任何人来说更简单:

< library(Rcpp)
> sourceCpp(system.file("tinytest","cpp","sample.cpp", package="RcppArmadillo"))
> set.seed(123)
> csample_integer(1:5, 10, TRUE, c(0.4, 0.3, 0.2, 0.05, 0.05))
 [1] 1 3 2 3 4 1 2 3 2 2
> 

Many thanks for the pointers.非常感谢您的指点。 I also had some problems with indexing the matrix, so that part is changed, too.我在索引矩阵时也遇到了一些问题,所以这部分也被改变了。 The following code works as intended (using sourceCpp() ):以下代码按预期工作(使用sourceCpp() ):

// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>

#include <RcppArmadilloExtensions/sample.h>

using namespace Rcpp;

// [[Rcpp::export]]

IntegerVector scores_cpp(NumericMatrix p, IntegerVector steps){
  
  int prows = p.nrow();
  
  IntegerVector score(prows);
  
  for(int i=0;i<prows;i++){
    int step = steps[i];
    
    IntegerVector scrs = seq(0,step);
    
    NumericMatrix probs = p(Range(i,i),Range(0,step));

    IntegerVector sc = RcppArmadillo::sample(scrs,1,true,probs);
    score[i] = sc[0];
  }
  
  return score;
  
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM