[英]NVIDIA nvcc compilation flag for constexpr depth and IEEE 754 exponent computation
Consider the following code that computes the exponent of a double floating point number as a constant expression (in the format specified by the IEEE 754 standard). 考虑以下代码,该代码将双浮点数的指数计算为常量表达式(采用IEEE 754标准指定的格式)。
template <typename T> constexpr T abs_CE(const T x){return x>=0?x:-x;}
constexpr unsigned long long int __double_exponent_CE_(const double x){return x==0?0:(x>=2.?(__double_exponent_CE_(x/2.)+1):(x<1?__double_exponent_CE_(x*2.)-1:0));}
constexpr unsigned long long int __double_exponent_CE(const double x){return (x==0)?0:(__double_exponent_CE_(abs_CE(x))+1023);}
That code fails to compile as constant expression in gcc under normal compilation flags circumstances for certain inputs like std::numeric_limits< double >::max. 对于某些输入,例如std :: numeric_limits <double> :: max,在正常编译标志情况下,该代码无法在gcc中作为常量表达式进行编译。 The reason it fails to compile is because it exceeds the max recursion depth for a constant expression (512 is the default value).
无法编译的原因是因为它超出了常量表达式的最大递归深度(默认值为512)。 For example std::numeric_limits< double >::max requires 1024 calls, exceeding the limit.
例如,std :: numeric_limits <double> :: max需要1024个调用,超出限制。
If the flag -fconstexpr-depth=2048 is added, then the code compiles perfectly, and evaluates to a constant expression that can be passed as a template parameter. 如果添加了-fconstexpr-depth = 2048标志,则代码将完美编译,并评估为可以作为模板参数传递的常量表达式。
That code fails to compile under nvcc with the flag -Xcompiler -fconstexpr-depth=2048
(specifically it crashes when nvcc issues the cicc command), so is there any way to change the depth limit in nvcc? 该代码无法在带有
-Xcompiler -fconstexpr-depth=2048
标志的nvcc下编译(特别是在nvcc发出cicc命令时崩溃),因此有什么办法可以更改nvcc中的深度限制吗? I have not found any flag to change it in NVCC options . 我尚未在NVCC选项中找到任何更改它的标志。
Just in case there is no such equivalent flag in nvcc, does anybody know any other way to compute the exponent of a double in compile time with less than 512 recursions calls? 以防万一nvcc中没有这样的等效标志,是否有人知道用其他方法来计算少于512个递归调用的编译时间中double的指数?
You should consider using " --expt-relaxed-constexpr " flag: 您应该考虑使用“ --expt-relaxed-constexpr ”标志:
Experimental flag:
实验标记:
Allow host code to invoke
__device__constexpr
functions, and device code to invoke__host__constexpr
functions.允许主机代码调用
__device__constexpr
函数,并允许设备代码调用__host__constexpr
函数。Note that the behavior of this flag may change in future compiler releases.
请注意,此标志的行为在将来的编译器版本中可能会更改。
Also in CUDA C Programming Guide : 同样在《 CUDA C编程指南》中 :
By default, a constexpr function cannot be called from a function with incompatible execution space.
默认情况下,不能从执行空间不兼容的函数中调用constexpr函数。 The experimental nvcc flag
--expt-relaxed-constexpr
removes this restriction.实验性nvcc标志
--expt-relaxed-constexpr
消除了此限制。 When this flag is specified, host code can invoke a__device__ constexpr
function and device code can invoke a__host__ constexpr
function.指定此标志后,主机代码可以调用
__device__ constexpr
函数,而设备代码可以调用__host__ constexpr
函数。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.