在 Rcpp 中使用 sample()

Question

我有一個包含概率的矩陣，四列中的每一列對應一個分數（從 0 到 4 的整數）。 我想使用該行中包含的概率作為采樣權重為每一行采樣一個分數。 在某些列不包含概率（取而代之的是 NA）的行中，抽樣框架僅限於包含概率的列（及其相應的分數）（例如，對於具有 0.45、0.55、NA、NA 的行，0 或 1 將是采樣）。 但是，我收到了這個錯誤（后面還有其他幾個），那么我怎樣才能讓它工作呢？：

error: no matching function for call to 'as<Rcpp::IntegerVector>(Rcpp::Matrix<14>::Sub&)'
     score[i] = sample(scrs,1,true,as<IntegerVector>(probs));

現有答案表明 RcppArmadillo 是解決方案，但我也無法讓它發揮作用。 如果我在 cppFunction 和score[i] = Rcpp::RcppArmadillo::sample(scrs,1,true,probs);之前添加： require(RcppArmadillo) ); 代替現有的 sample() 語句，我得到：

error: 'Rcpp::RcppArmadillo' has not been declared
     score[i] = Rcpp::RcppArmadillo::sample(scrs,1,true,probs);

或者，如果我還包括#include <RcppArmadilloExtensions/sample.h>在頂部，我得到：

fatal error: RcppArmadilloExtensions/sample.h: No such file or directory
   #include <RcppArmadilloExtensions/sample.h>

可重現的代碼：

p.vals <- matrix(c(0.44892077,0.55107923,NA,NA,
                 0.37111195,0.62888805,NA,NA,
                 0.04461714,0.47764478,0.303590351,1.741477e-01,
                 0.91741642,0.07968127,0.002826406,7.589714e-05,
                 0.69330800,0.24355559,0.058340934,4.795468e-03,
                 0.43516823,0.43483784,0.120895859,9.098067e-03,
                 0.73680809,0.22595438,0.037237525,NA,
                 0.89569365,0.10142719,0.002879163,NA),nrow=8,ncol=4,byrow=TRUE)

step.vals <- c(1,1,3,3,3,3,2,2)

require(Rcpp)
cppFunction('IntegerVector scores_cpp(NumericMatrix p, IntegerVector steps){

  int prows = p.nrow();

  IntegerVector score(prows);
  
  for(int i=0;i<prows;i++){
    int step = steps[i];
    
    IntegerVector scrs = seq(0,step);
    
    int start = 0;
    int end = step;
    
    NumericMatrix::Sub probs = p(Range(i,i),Range(start,end));

    score[i] = sample(scrs,1,true,probs);
  }
  
  return score;
  
}')

test <- scores_cpp(p.vals,step.vals)
test

注意：每行的 step.vals 的值始終等於該行中包含概率的列數 -1。 所以將 step.values 傳遞給函數可能是多余的。

Answer 1

您可能會在這里擁有“以樹換林”的時刻。 RcppArmadillo單元測試實際上提供了一個工作示例。 如果您查看源文件inst/tinytest/test_sample.R ，它有一個簡單的

Rcpp::sourceCpp("cpp/sample.cpp")

在那個文件 inst/tinytest/cpp/sample.cpp我們有標准

// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>

#include <RcppArmadilloExtensions/sample.h>

a) 告訴 R 查看RcppArmadillo頭目錄和 b) 包括采樣器擴展。 這就是它的工作方式，並且已被證明可以工作近十年。

作為一個例子，我可以做（在我的$HOME包含git/rcpparmadillo的目錄中）

> Rcpp::sourceCpp("git/rcpparmadillo/inst/tinytest/cpp/sample.cpp")
> set.seed(123)
> csample_integer(1:5, 10, TRUE, c(0.4, 0.3, 0.2, 0.05, 0.05))
 [1] 1 3 2 3 4 1 2 3 2 2
>

后來的 Rcpp 添加的工作方式相同，但我發現使用 RcppArmadillo 處理矩陣的部分內容更具表現力和方便性。

編輯：對於安裝了RcppArmadillo軟件包的任何人來說更簡單：

< library(Rcpp)
> sourceCpp(system.file("tinytest","cpp","sample.cpp", package="RcppArmadillo"))
> set.seed(123)
> csample_integer(1:5, 10, TRUE, c(0.4, 0.3, 0.2, 0.05, 0.05))
 [1] 1 3 2 3 4 1 2 3 2 2
>

Answer 2

非常感謝您的指點。 我在索引矩陣時也遇到了一些問題，所以這部分也被改變了。 以下代碼按預期工作（使用sourceCpp() ）：

// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>

#include <RcppArmadilloExtensions/sample.h>

using namespace Rcpp;

// [[Rcpp::export]]

IntegerVector scores_cpp(NumericMatrix p, IntegerVector steps){
  
  int prows = p.nrow();
  
  IntegerVector score(prows);
  
  for(int i=0;i<prows;i++){
    int step = steps[i];
    
    IntegerVector scrs = seq(0,step);
    
    NumericMatrix probs = p(Range(i,i),Range(0,step));

    IntegerVector sc = RcppArmadillo::sample(scrs,1,true,probs);
    score[i] = sc[0];
  }
  
  return score;
  
}

在 Rcpp 中使用 sample()

問題描述

2 個解決方案

解決方案1
2 2022-06-29 10:39:55

解決方案2
0 已采納 2022-07-01 14:28:18

在 Rcpp 中使用 sample()

問題描述

2 個解決方案

解決方案1 2 2022-06-29 10:39:55

解決方案2 0 已采納 2022-07-01 14:28:18

解決方案1
2 2022-06-29 10:39:55

解決方案2
0 已采納 2022-07-01 14:28:18