[英]Special CUDA Double Precision trig functions for SFU
I was wondering how I would go about using __cos(x)
(and respectively __sin(x)
) in the kernel code with CUDA. 我想知道如何在CUDA的内核代码中使用
__cos(x)
(和__sin(x)
)。 I looked up in the CUDA manual that there is such a device function however when I implement it the compiler just says that I cannot call a host function in the device. 我在CUDA手册中查找了这样的设备功能,但是当我实现它时,编译器只是说我不能在设备中调用主机功能。
However, I found that there are two sister functions cosf(x)
and __cosf(x)
the latter of which runs on the SFU and is overall much faster than the original cosf(x)
function. 但是,我发现有两个姐妹函数
cosf(x)
和__cosf(x)
,后者在SFU上运行,总体上比原始cosf(x)
函数快得多。 The compiler does not complain about the __cosf(x)
function of course. 编译器当然不会抱怨
__cosf(x)
函数。
Is there a library I'm missing? 我有遗失的图书馆吗? Am I mistaken about this trig function?
我错了这个触发功能吗?
As the SFU only supports certain single-precision operations, there are no double-precision __cos() and __sin() device functions. 由于SFU仅支持某些单精度操作,因此没有双精度__cos()和__sin()设备函数。 There are single-precision __cosf() and __sinf() device functions, as well as other functions detailed in table C-4 of the CUDA 4.2 Programming Manual.
有单精度__cosf()和__sinf()设备功能,以及CUDA 4.2编程手册表C-4中详述的其他功能。
I assume you are looking for faster alternatives to the double-precision versions of the standard math functions sin() and cos()? 我假设您正在寻找标准数学函数sin()和cos()的双精度版本的更快替代方案? If sine and cosine of the same argument are needed, sincos() should be used for a significant performance boost.
如果需要相同参数的正弦和余弦,则应使用sincos()来显着提升性能。 If the argument of sine or cosine is multiplied by π, you would want to use sinpi(), cospi(), or sincospi() instead, for even more performance.
如果正弦或余弦的参数乘以π,则可能需要使用sinpi(),cospi()或sincospi()来获得更高的性能。 For example, sincospi() is very useful when implementing the Box-Muller algorithm for generating normally distributed random numbers.
例如,在实现用于生成正态分布随机数的Box-Muller算法时,sincospi()非常有用。 Also, check out the CUDA 5.0 preview for best possible performance (note that the preview provides alpha-release quality).
另外,请查看CUDA 5.0预览以获得最佳性能(请注意,预览提供了alpha版本质量)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.