简体   繁体   English

从 Rcpp 获取与基数 R 相同的整数样本

[英]Get the same sample of integers from Rcpp as base R

Is it possible to get the same sample of integers from Rcpp as from base R's sample ?是否有可能从Rcpp与从基础 R 的sample获得的整数sample相同的整数sample

I have tried using Rcpp::sample and Rcpp::RcppArmadillo::sample but they do not return the same values -- example code below.我曾尝试使用Rcpp::sampleRcpp::RcppArmadillo::sample但它们没有返回相同的值——下面的示例代码。 Additionally, the Quick Example section of post https://gallery.rcpp.org/articles/using-the-Rcpp-based-sample-implementation/ returns the same sample from Rcpp and base R, however, I cannot reproduce these results (I attach this code at the end).此外,帖子https://gallery.rcpp.org/articles/using-the-Rcpp-based-sample-implementation/Quick Example部分从Rcpp和 base R 返回相同的样本,但是,我无法重现这些结果(我在最后附上了这段代码)。

Can this be done / what am I doing wrong please?可以这样做/我做错了什么吗?

My attempts:我的尝试:

// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
#include <RcppArmadilloExtensions/sample.h>

// [[Rcpp::export]]
Rcpp::IntegerVector mysamp1( int n) {
  Rcpp::IntegerVector v = Rcpp::sample(n, n);
  return v;
}

// [[Rcpp::export]]
Rcpp::IntegerVector mysamp2(int n) {  
  Rcpp::IntegerVector i = Rcpp::seq(1,n);
  Rcpp::IntegerVector v = wrap(Rcpp::RcppArmadillo::sample(i,n,false));
  return v;
}

// set seed https://stackoverflow.com/questions/43221681/changing-rs-seed-from-rcpp-to-guarantee-reproducibility
// [[Rcpp::export]]
void set_seed(double seed) {
  Rcpp::Environment base_env("package:base");
  Rcpp::Function set_seed_r = base_env["set.seed"];
  set_seed_r(std::floor(std::fabs(seed)));
}

// [[Rcpp::export]]
Rcpp::IntegerVector mysamp3( int n, int seed) {
  set_seed(seed); 
  Rcpp::IntegerVector v = Rcpp::sample(n, n);
  return v;
}


/***R
set.seed(1)
sample(10)
#  [1]  9  4  7  1  2  5  3 10  6  8
set.seed(1)
mysamp1(10)
#  [1]  3  4  5  7  2  8  9  6 10  1
set.seed(1)
mysamp2(10)
#  [1]  3  4  5  7  2  8  9  6 10  1
mysamp3(10, 1)
#  [1]  3  4  5  7  2  8  9  6 10  1

*/

Code from the Using the RcppArmadillo-based Implementation of R's sample() gallery post which return FALSE on my system:来自Using the RcppArmadillo-based Implementation of R's sample() gallery post 的代码,它在我的系统上返回FALSE

// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadilloExtensions/sample.h>
using namespace Rcpp ;

// [[Rcpp::export]]
CharacterVector csample_char( CharacterVector x, 
                              int size,
                              bool replace, 
                              NumericVector prob = NumericVector::create()) {
  CharacterVector ret = RcppArmadillo::sample(x, size, replace, prob) ;
  return ret ;
}

/*** R
N <- 10
set.seed(7)
sample.r <- sample(letters, N, replace=T)

set.seed(7)
sample.c <- csample_char(letters, N, replace=T)

print(identical(sample.r, sample.c))
# [1] FALSE
*/

Compiling comments into an answer.将评论编译为答案。 Akrun noted that by setting RNGkind or RNGversion we can replicate results. Akrun 指出,通过设置RNGkindRNGversion我们可以复制结果。 From DirkEddelbuettel;来自 DirkEddelbuettel; there was a "change in R's RNG that came about because someone noticed a bias in, IIRC, use of sampling (at very large N). So thats why you you to turn an option on in R to get the old (matching) behaviour. " And RalfStubner indicates that this is a known issue: https://github.com/RcppCore/RcppArmadillo/issues/250 and https://github.com/RcppCore/Rcpp/issues/945 “R 的 RNG 发生了变化,因为有人注意到 IIRC 使用采样(在非常大的 N 处)存在偏差。所以这就是为什么你要在 R 中打开一个选项以获得旧的(匹配)行为." 并且 RalfStubner 指出这是一个已知问题: https : //github.com/RcppCore/RcppArmadillo/issues/250https://github.com/RcppCore/Rcpp/issues/945

Presently R uses a different default sampler which leads to different results目前 R 使用不同的默认采样器导致不同的结果

RNGkind(sample.kind = "Rejection")
set.seed(1)
sample(10)
# [1]  9  4  7  1  2  5  3 10  6  8
set.seed(1)
mysamp1(10)
# [1]  3  4  5  7  2  8  9  6 10  1

However, an earlier version can be used using但是,可以使用较早的版本

RNGkind(sample.kind = "Rounding")
#Warning message:
#  In RNGkind("Mersenne-Twister", "Inversion", "Rounding") : non-uniform 'Rounding' sampler used

set.seed(1)
sample(10)
# [1]  3  4  5  7  2  8  9  6 10  1
set.seed(1)
mysamp1(10)
# [1]  3  4  5  7  2  8  9  6 10  1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM