简体   繁体   English

NVIDIA nvcc编译标志,用于constexpr深度和IEEE 754指数计算

[英]NVIDIA nvcc compilation flag for constexpr depth and IEEE 754 exponent computation

Consider the following code that computes the exponent of a double floating point number as a constant expression (in the format specified by the IEEE 754 standard). 考虑以下代码,该代码将双浮点数的指数计算为常量表达式(采用IEEE 754标准指定的格式)。

    template <typename T>  constexpr T abs_CE(const T x){return x>=0?x:-x;}
    constexpr unsigned long long int __double_exponent_CE_(const double x){return x==0?0:(x>=2.?(__double_exponent_CE_(x/2.)+1):(x<1?__double_exponent_CE_(x*2.)-1:0));}
    constexpr unsigned long long int __double_exponent_CE(const double x){return (x==0)?0:(__double_exponent_CE_(abs_CE(x))+1023);}

That code fails to compile as constant expression in gcc under normal compilation flags circumstances for certain inputs like std::numeric_limits< double >::max. 对于某些输入,例如std :: numeric_limits <double> :: max,在正常编译标志情况下,该代码无法在gcc中作为常量表达式进行编译。 The reason it fails to compile is because it exceeds the max recursion depth for a constant expression (512 is the default value). 无法编译的原因是因为它超出了常量表达式的最大递归深度(默认值为512)。 For example std::numeric_limits< double >::max requires 1024 calls, exceeding the limit. 例如,std :: numeric_limits <double> :: max需要1024个调用,超出限制。

If the flag -fconstexpr-depth=2048 is added, then the code compiles perfectly, and evaluates to a constant expression that can be passed as a template parameter. 如果添加了-fconstexpr-depth = 2048标志,则代码将完美编译,并评估为可以作为模板参数传递的常量表达式。

That code fails to compile under nvcc with the flag -Xcompiler -fconstexpr-depth=2048 (specifically it crashes when nvcc issues the cicc command), so is there any way to change the depth limit in nvcc? 该代码无法在带有-Xcompiler -fconstexpr-depth=2048标志的nvcc下编译(特别是在nvcc发出cicc命令时崩溃),因此有什么办法可以更改nvcc中的深度限制吗? I have not found any flag to change it in NVCC options . 我尚未在NVCC选项中找到任何更改它的标志。

Just in case there is no such equivalent flag in nvcc, does anybody know any other way to compute the exponent of a double in compile time with less than 512 recursions calls? 以防万一nvcc中没有这样的等效标志,是否有人知道用其他方法来计算少于512个递归调用的编译时间中double的指数?

You should consider using " --expt-relaxed-constexpr " flag: 您应该考虑使用“ --expt-relaxed-constexpr ”标志:

Experimental flag: 实验标记:

Allow host code to invoke __device__constexpr functions, and device code to invoke __host__constexpr functions. 允许主机代码调用__device__constexpr函数,并允许设备代码调用__host__constexpr函数。

Note that the behavior of this flag may change in future compiler releases. 请注意,此标志的行为在将来的编译器版本中可能会更改。

Also in CUDA C Programming Guide : 同样在《 CUDA C编程指南》中

By default, a constexpr function cannot be called from a function with incompatible execution space. 默认情况下,不能从执行空间不兼容的函数中调用constexpr函数。 The experimental nvcc flag --expt-relaxed-constexpr removes this restriction. 实验性nvcc标志--expt-relaxed-constexpr消除了此限制。 When this flag is specified, host code can invoke a __device__ constexpr function and device code can invoke a __host__ constexpr function. 指定此标志后,主机代码可以调用__device__ constexpr函数,而设备代码可以调用__host__ constexpr函数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM