简体   繁体   English

用于SFU的特殊CUDA双精度触发功能

[英]Special CUDA Double Precision trig functions for SFU

I was wondering how I would go about using __cos(x) (and respectively __sin(x) ) in the kernel code with CUDA. 我想知道如何在CUDA的内核代码中使用__cos(x) (和__sin(x) )。 I looked up in the CUDA manual that there is such a device function however when I implement it the compiler just says that I cannot call a host function in the device. 我在CUDA手册中查找了这样的设备功能,但是当我实现它时,编译器只是说我不能在设备中调用主机功能。

However, I found that there are two sister functions cosf(x) and __cosf(x) the latter of which runs on the SFU and is overall much faster than the original cosf(x) function. 但是,我发现有两个姐妹函数cosf(x)__cosf(x) ,后者在SFU上运行,总体上比原始cosf(x)函数快得多。 The compiler does not complain about the __cosf(x) function of course. 编译器当然不会抱怨__cosf(x)函数。

Is there a library I'm missing? 我有遗失的图书馆吗? Am I mistaken about this trig function? 我错了这个触发功能吗?

As the SFU only supports certain single-precision operations, there are no double-precision __cos() and __sin() device functions. 由于SFU仅支持某些单精度操作,因此没有双精度__cos()和__sin()设备函数。 There are single-precision __cosf() and __sinf() device functions, as well as other functions detailed in table C-4 of the CUDA 4.2 Programming Manual. 有单精度__cosf()和__sinf()设备功能,以及CUDA 4.2编程手册表C-4中详述的其他功能。

I assume you are looking for faster alternatives to the double-precision versions of the standard math functions sin() and cos()? 我假设您正在寻找标准数学函数sin()和cos()的双精度版本的更快替代方案? If sine and cosine of the same argument are needed, sincos() should be used for a significant performance boost. 如果需要相同参数的正弦和余弦,则应使用sincos()来显着提升性能。 If the argument of sine or cosine is multiplied by π, you would want to use sinpi(), cospi(), or sincospi() instead, for even more performance. 如果正弦或余弦的参数乘以π,则可能需要使用sinpi(),cospi()或sincospi()来获得更高的性能。 For example, sincospi() is very useful when implementing the Box-Muller algorithm for generating normally distributed random numbers. 例如,在实现用于生成正态分布随机数的Box-Muller算法时,sincospi()非常有用。 Also, check out the CUDA 5.0 preview for best possible performance (note that the preview provides alpha-release quality). 另外,请查看CUDA 5.0预览以获得最佳性能(请注意,预览提供了alpha版本质量)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM