[英]Getting "segfault from C stack overflow" when using Rcpp to sample from std::uniform_int_distribution
I wrote this little C++ function that works like R's sample.int(..., replace =FALSE)
function.我编写了这个像 R 的
sample.int(..., replace =FALSE)
函数一样工作的 C++ 小函数。 Essentially it draws from uniformly distributed integers and writes the results into a set until the set is of size size
.本质上,它从均匀分布的整数中提取并将结果写入一个集合,直到该集合的大小为
size
。 Maybe I'm missing something here, but I find the following behaviour quite strange.也许我在这里遗漏了一些东西,但我发现以下行为很奇怪。 Here's a reprex:
这是一个代表:
#reprex.cpp
#include <Rcpp.h>
#include <random>
#include <set>
// [[Rcpp::export]]
std::set<unsigned long long int> sample_int(
unsigned long long int N,
unsigned long long int size)
{
std::mt19937 rng(std::random_device{}());
// Create an empty set of integers.
std::set<unsigned long long int> set;
while (set.size() < size)
{
unsigned long long int value = std::uniform_int_distribution<int>(1, N)(rng);
set.insert(value);
}
return set;
}
/*** R
very_big_n <- 15^16
less_big_n <- 16^15
less_big_n < very_big_n
sample_int(15^16, 10)
sample_int(16^15, 10)
*/
Executing this using Rcpp
yields:使用
Rcpp
执行此操作会产生:
[R] Rcpp::sourceCpp("reprex.cpp")
[R] very_big_n <- 15^16
[R] less_big_n <- 16^15
[R] less_big_n < very_big_n
[1] TRUE
[R] sample_int(very_big_n, 10)
[1] 114533684 182757292 493592758 712746739 751345901 804523992 867187282
[8] 905509919 929228169 929784901
[R] sample_int(less_big_n, 10)
Error: segfault from C stack overflow
Am I missing something here?我在这里错过了什么吗? Why do I get that segfault when calling
sample_int
with a smaller input but not with that very large one?为什么在使用较小的输入而不是非常大的输入调用
sample_int
时会出现段错误?
I'm not going to judge whether or not your code is effective, optimized or in general safe.我不会判断您的代码是否有效、优化或总体上安全。
I will however answer your question, the answer lies within this line of code (the error is enclosed in double asterix):但是我会回答你的问题,答案就在这行代码中(错误包含在双星号中):
unsigned long long int value = std::uniform_int_distribution**<int>**(1, N)(rng);
By changing the template type to unsigned long long ie:通过将模板类型更改为 unsigned long long 即:
unsigned long long int value = std::uniform_int_distribution<unsigned long long>(1, N)(rng);
You fix your stack overflow.您修复了堆栈溢出。 And your function should now work with "very large" numbers.
您的函数现在应该可以处理“非常大”的数字。 The fact that it didn't happen with "very big n" is just a coincidence.
“非常大的 n”没有发生这种情况的事实只是巧合。
The stack overflow happens within this function - one of the interval checks for the formula that generates the random number fails.堆栈溢出发生在此函数内 - 生成随机数的公式的间隔检查之一失败。 That is because the upper limit is the one that overflows, ie after replicating the same error you experienced and going through the stacktrace you will get a more meaningful error message, something like this:
这是因为上限是溢出的上限,即在复制您遇到的相同错误并通过堆栈跟踪后,您将获得更有意义的错误消息,如下所示:
/usr/include/c++/12.2.0/bits/uniform_int_dist.h:97:
std::uniform_int_distribution<_IntType>::param_type::param_type(_IntType, _IntType)
[with _IntType = int]: Assertion '_M_a <= _M_b' failed.
Hope it helps!希望能帮助到你!
EDIT: As Dirk Eddelbuettel mentioned in the comments using unsigned long long is an archaic from old times.编辑:正如 Dirk Eddelbuettel 在评论中提到的那样,使用 unsigned long long 是一个古老的时代。 Even though that in the STL documentation they state that std::uniform_int_distribution might have undefined behavior when not using any of the suggested template types, uint64_t should still work fine (I've skimmed through the implementation).
尽管在 STL 文档中他们声明 std::uniform_int_distribution 在不使用任何建议的模板类型时可能具有未定义的行为,但 uint64_t 应该仍然可以正常工作(我已经浏览了实现)。 The added benefit is that uint64_t is consistent across different architectures.
额外的好处是 uint64_t 在不同的架构中是一致的。 To use the uint64_t integer type you just need to include this header:
要使用 uint64_t 整数类型,您只需要包含此标头:
#include <cstdint>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.