使用 Rcpp 从 std::uniform_int_distribution 采样时出现“来自 C 堆栈溢出的段错误”

Question

I wrote this little C++ function that works like R's sample.int(..., replace =FALSE) function.我编写了这个像 R 的sample.int(..., replace =FALSE)函数一样工作的 C++ 小函数。 Essentially it draws from uniformly distributed integers and writes the results into a set until the set is of size size .本质上，它从均匀分布的整数中提取并将结果写入一个集合，直到该集合的大小为size 。 Maybe I'm missing something here, but I find the following behaviour quite strange.也许我在这里遗漏了一些东西，但我发现以下行为很奇怪。 Here's a reprex:这是一个代表：

#reprex.cpp
#include <Rcpp.h>
#include <random>
#include <set>

// [[Rcpp::export]]
std::set<unsigned long long int> sample_int(
    unsigned long long int N,
    unsigned long long int size)
{
    std::mt19937 rng(std::random_device{}());

    // Create an empty set of integers.
    std::set<unsigned long long int> set;

    while (set.size() < size)
    {
        unsigned long long int value = std::uniform_int_distribution<int>(1, N)(rng);
        set.insert(value);
    }

    return set;
}

/*** R
very_big_n <- 15^16
less_big_n <- 16^15

less_big_n < very_big_n

sample_int(15^16, 10)

sample_int(16^15, 10)
*/

Executing this using Rcpp yields:使用Rcpp执行此操作会产生：

[R] Rcpp::sourceCpp("reprex.cpp")

[R] very_big_n <- 15^16

[R] less_big_n <- 16^15

[R] less_big_n < very_big_n
[1] TRUE

[R] sample_int(very_big_n, 10)
 [1] 114533684 182757292 493592758 712746739 751345901 804523992 867187282
 [8] 905509919 929228169 929784901

[R] sample_int(less_big_n, 10)
Error: segfault from C stack overflow

Am I missing something here?我在这里错过了什么吗？ Why do I get that segfault when calling sample_int with a smaller input but not with that very large one?为什么在使用较小的输入而不是非常大的输入调用sample_int时会出现段错误？

Answer 1

I'm not going to judge whether or not your code is effective, optimized or in general safe.我不会判断您的代码是否有效、优化或总体上安全。

I will however answer your question, the answer lies within this line of code (the error is enclosed in double asterix):但是我会回答你的问题，答案就在这行代码中（错误包含在双星号中）：

unsigned long long int value = std::uniform_int_distribution**<int>**(1, N)(rng);

By changing the template type to unsigned long long ie:通过将模板类型更改为 unsigned long long 即：

unsigned long long int value = std::uniform_int_distribution<unsigned long long>(1, N)(rng);

You fix your stack overflow.您修复了堆栈溢出。 And your function should now work with "very large" numbers.您的函数现在应该可以处理“非常大”的数字。 The fact that it didn't happen with "very big n" is just a coincidence. “非常大的 n”没有发生这种情况的事实只是巧合。

The stack overflow happens within this function - one of the interval checks for the formula that generates the random number fails.堆栈溢出发生在此函数内 - 生成随机数的公式的间隔检查之一失败。 That is because the upper limit is the one that overflows, ie after replicating the same error you experienced and going through the stacktrace you will get a more meaningful error message, something like this:这是因为上限是溢出的上限，即在复制您遇到的相同错误并通过堆栈跟踪后，您将获得更有意义的错误消息，如下所示：

/usr/include/c++/12.2.0/bits/uniform_int_dist.h:97:
 std::uniform_int_distribution<_IntType>::param_type::param_type(_IntType, _IntType) 
[with _IntType = int]: Assertion '_M_a <= _M_b' failed.

Hope it helps!希望能帮助到你！

EDIT: As Dirk Eddelbuettel mentioned in the comments using unsigned long long is an archaic from old times.编辑：正如 Dirk Eddelbuettel 在评论中提到的那样，使用 unsigned long long 是一个古老的时代。 Even though that in the STL documentation they state that std::uniform_int_distribution might have undefined behavior when not using any of the suggested template types, uint64_t should still work fine (I've skimmed through the implementation).尽管在 STL 文档中他们声明 std::uniform_int_distribution 在不使用任何建议的模板类型时可能具有未定义的行为，但 uint64_t 应该仍然可以正常工作（我已经浏览了实现）。 The added benefit is that uint64_t is consistent across different architectures.额外的好处是 uint64_t 在不同的架构中是一致的。 To use the uint64_t integer type you just need to include this header:要使用 uint64_t 整数类型，您只需要包含此标头：

#include <cstdint>

使用 Rcpp 从 std::uniform_int_distribution 采样时出现“来自 C 堆栈溢出的段错误”

问题描述

1 个解决方案

解决方案1
2 已采纳 2022-12-15 19:03:22

使用 Rcpp 从 std::uniform_int_distribution 采样时出现“来自 C 堆栈溢出的段错误”

问题描述

1 个解决方案

解决方案1 2 已采纳 2022-12-15 19:03:22

解决方案1
2 已采纳 2022-12-15 19:03:22