NVIDIA nvcc编译标志，用于constexpr深度和IEEE 754指数计算

Question

Consider the following code that computes the exponent of a double floating point number as a constant expression (in the format specified by the IEEE 754 standard). 考虑以下代码，该代码将双浮点数的指数计算为常量表达式（采用IEEE 754标准指定的格式）。

    template <typename T>  constexpr T abs_CE(const T x){return x>=0?x:-x;}
    constexpr unsigned long long int __double_exponent_CE_(const double x){return x==0?0:(x>=2.?(__double_exponent_CE_(x/2.)+1):(x<1?__double_exponent_CE_(x*2.)-1:0));}
    constexpr unsigned long long int __double_exponent_CE(const double x){return (x==0)?0:(__double_exponent_CE_(abs_CE(x))+1023);}

That code fails to compile as constant expression in gcc under normal compilation flags circumstances for certain inputs like std::numeric_limits< double >::max. 对于某些输入，例如std :: numeric_limits <double> :: max，在正常编译标志情况下，该代码无法在gcc中作为常量表达式进行编译。 The reason it fails to compile is because it exceeds the max recursion depth for a constant expression (512 is the default value). 无法编译的原因是因为它超出了常量表达式的最大递归深度（默认值为512）。 For example std::numeric_limits< double >::max requires 1024 calls, exceeding the limit. 例如，std :: numeric_limits <double> :: max需要1024个调用，超出限制。

If the flag -fconstexpr-depth=2048 is added, then the code compiles perfectly, and evaluates to a constant expression that can be passed as a template parameter. 如果添加了-fconstexpr-depth = 2048标志，则代码将完美编译，并评估为可以作为模板参数传递的常量表达式。

That code fails to compile under nvcc with the flag -Xcompiler -fconstexpr-depth=2048 (specifically it crashes when nvcc issues the cicc command), so is there any way to change the depth limit in nvcc? 该代码无法在带有-Xcompiler -fconstexpr-depth=2048标志的nvcc下编译（特别是在nvcc发出cicc命令时崩溃），因此有什么办法可以更改nvcc中的深度限制吗？ I have not found any flag to change it in NVCC options . 我尚未在NVCC选项中找到任何更改它的标志。

Just in case there is no such equivalent flag in nvcc, does anybody know any other way to compute the exponent of a double in compile time with less than 512 recursions calls? 以防万一nvcc中没有这样的等效标志，是否有人知道用其他方法来计算少于512个递归调用的编译时间中double的指数？

Answer 1

You should consider using " --expt-relaxed-constexpr " flag: 您应该考虑使用“ --expt-relaxed-constexpr ”标志：

Experimental flag: 实验标记：

Allow host code to invoke __device__constexpr functions, and device code to invoke __host__constexpr functions. 允许主机代码调用__device__constexpr函数，并允许设备代码调用__host__constexpr函数。

Note that the behavior of this flag may change in future compiler releases. 请注意，此标志的行为在将来的编译器版本中可能会更改。

Also in CUDA C Programming Guide : 同样在《 CUDA C编程指南》中：

By default, a constexpr function cannot be called from a function with incompatible execution space. 默认情况下，不能从执行空间不兼容的函数中调用constexpr函数。 The experimental nvcc flag --expt-relaxed-constexpr removes this restriction. 实验性nvcc标志--expt-relaxed-constexpr消除了此限制。 When this flag is specified, host code can invoke a __device__ constexpr function and device code can invoke a __host__ constexpr function. 指定此标志后，主机代码可以调用__device__ constexpr函数，而设备代码可以调用__host__ constexpr函数。

NVIDIA nvcc编译标志，用于constexpr深度和IEEE 754指数计算

问题描述

1 个解决方案

解决方案1
1 2019-02-20 05:11:13

NVIDIA nvcc编译标志，用于constexpr深度和IEEE 754指数计算

问题描述

1 个解决方案

解决方案1 1 2019-02-20 05:11:13

解决方案1
1 2019-02-20 05:11:13