简体   繁体   English

使用 Rcpp 从 std::uniform_int_distribution 采样时出现“来自 C 堆栈溢出的段错误”

[英]Getting "segfault from C stack overflow" when using Rcpp to sample from std::uniform_int_distribution

I wrote this little C++ function that works like R's sample.int(..., replace =FALSE) function.我编写了这个像 R 的sample.int(..., replace =FALSE)函数一样工作的 C++ 小函数。 Essentially it draws from uniformly distributed integers and writes the results into a set until the set is of size size .本质上,它从均匀分布的整数中提取并将结果写入一个集合,直到该集合的大小为size Maybe I'm missing something here, but I find the following behaviour quite strange.也许我在这里遗漏了一些东西,但我发现以下行为很奇怪。 Here's a reprex:这是一个代表:

#reprex.cpp
#include <Rcpp.h>
#include <random>
#include <set>

// [[Rcpp::export]]
std::set<unsigned long long int> sample_int(
    unsigned long long int N,
    unsigned long long int size)
{
    std::mt19937 rng(std::random_device{}());

    // Create an empty set of integers.
    std::set<unsigned long long int> set;

    while (set.size() < size)
    {
        unsigned long long int value = std::uniform_int_distribution<int>(1, N)(rng);
        set.insert(value);
    }

    return set;
}

/*** R
very_big_n <- 15^16
less_big_n <- 16^15

less_big_n < very_big_n

sample_int(15^16, 10)

sample_int(16^15, 10)
*/

Executing this using Rcpp yields:使用Rcpp执行此操作会产生:

[R] Rcpp::sourceCpp("reprex.cpp")

[R] very_big_n <- 15^16

[R] less_big_n <- 16^15

[R] less_big_n < very_big_n
[1] TRUE

[R] sample_int(very_big_n, 10)
 [1] 114533684 182757292 493592758 712746739 751345901 804523992 867187282
 [8] 905509919 929228169 929784901

[R] sample_int(less_big_n, 10)
Error: segfault from C stack overflow

Am I missing something here?我在这里错过了什么吗? Why do I get that segfault when calling sample_int with a smaller input but not with that very large one?为什么在使用较小的输入而不是非常大的输入调用sample_int时会出现段错误?

I'm not going to judge whether or not your code is effective, optimized or in general safe.我不会判断您的代码是否有效、优化或总体上安全。

I will however answer your question, the answer lies within this line of code (the error is enclosed in double asterix):但是我会回答你的问题,答案就在这行代码中(错误包含在双星号中):

unsigned long long int value = std::uniform_int_distribution**<int>**(1, N)(rng);

By changing the template type to unsigned long long ie:通过将模板类型更改为 unsigned long long 即:

unsigned long long int value = std::uniform_int_distribution<unsigned long long>(1, N)(rng);

You fix your stack overflow.您修复了堆栈溢出。 And your function should now work with "very large" numbers.您的函数现在应该可以处理“非常大”的数字。 The fact that it didn't happen with "very big n" is just a coincidence. “非常大的 n”没有发生这种情况的事实只是巧合。

The stack overflow happens within this function - one of the interval checks for the formula that generates the random number fails.堆栈溢出发生在此函数内 - 生成随机数的公式的间隔检查之一失败。 That is because the upper limit is the one that overflows, ie after replicating the same error you experienced and going through the stacktrace you will get a more meaningful error message, something like this:这是因为上限是溢出的上限,即在复制您遇到的相同错误并通过堆栈跟踪后,您将获得更有意义的错误消息,如下所示:

/usr/include/c++/12.2.0/bits/uniform_int_dist.h:97:
 std::uniform_int_distribution<_IntType>::param_type::param_type(_IntType, _IntType) 
[with _IntType = int]: Assertion '_M_a <= _M_b' failed.

Hope it helps!希望能帮助到你!

EDIT: As Dirk Eddelbuettel mentioned in the comments using unsigned long long is an archaic from old times.编辑:正如 Dirk Eddelbuettel 在评论中提到的那样,使用 unsigned long long 是一个古老的时代。 Even though that in the STL documentation they state that std::uniform_int_distribution might have undefined behavior when not using any of the suggested template types, uint64_t should still work fine (I've skimmed through the implementation).尽管在 STL 文档中他们声明 std::uniform_int_distribution 在不使用任何建议的模板类型时可能具有未定义的行为,但 uint64_t 应该仍然可以正常工作(我已经浏览了实现)。 The added benefit is that uint64_t is consistent across different architectures.额外的好处是 uint64_t 在不同的架构中是一致的。 To use the uint64_t integer type you just need to include this header:要使用 uint64_t 整数类型,您只需要包含此标头:

#include <cstdint>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM