简体繁体 English

关于浮点运算

[英]about floating point operation

原文 2012-08-01 19:31:09 8 1 cuda/ gpgpu

Recently, I have been making program (FDTD Operation) using the CUDA development environment, OS is Windows server 2008 , Graphic card is TeslaC2070, compiler is VS2010. 最近，我一直在使用CUDA开发环境制作程序（FDTD Operation），OS是Windows server 2008，图形卡是TeslaC2070，编译器是VS2010。 This program calculates using single and double precision floating-point. 该程序使用单精度和双精度浮点计算。

I was reading the CUDA programming guide 3.2 and 4.0 . 我正在阅读CUDA编程指南3.2和4.0。 In appendix, guide tell me sin() , cos() has maximum accuracy of 2 ULP. 在附录中，指南告诉我sin() ， cos()最大准确度为2 ULP。 My original CPU program produces results which are different to the CUDA Version. 我原来的CPU程序产生的结果与CUDA版本不同。

I want to make results correctly same. 我想使结果正确相同。 Is it possible? 可能吗？

1 个解决方案

To quote Goldberg (a paper that every Computer Scientist, Computational Scientist, and possibly even every scientist who programs, should read): 引用Goldberg （一篇论文，每个计算机科学家，计算科学家，甚至可能是每个编程的科学家都应该阅读）：

Due to roundoff errors, the associative laws of algebra do not necessarily hold for floating-point numbers. 由于舍入误差，代数的关联定律不一定适用于浮点数。

This means that when you change the order of operations—even when using ostensibly associative arithmetic—you are likely to get slightly different answers. 这意味着当您更改操作顺序时 - 即使使用表面上的关联算法 - 您可能会得到稍微不同的答案。

Parallelism, by definition, results in different ordering of operations relative to serial arithmetic. 根据定义，并行性导致相对于串行算术的不同操作顺序。 "Embarrasingly parallel" computations, that is, computations where each output element is computed independently from all others, sometimes do not have to worry about this. “令人尴尬的并行”计算，即每个输出元素独立于其他输出元素计算的计算，有时不必担心这一点。 But collective operations, like reductions or scans, and spatial neighborhood computations, such stencils (as in FDTD), do experience this effect. 但集体操作，如减少或扫描，以及空间邻域计算，如模板（如FDTD），确实会遇到这种影响。

In practice, even using a different compiler (and even different compiler options) can change the result of floating point computation, even when compiling the same code, with or without parallelism. 实际上，即使使用不同的编译器（甚至不同的编译器选项）也可以改变浮点计算的结果，即使在编译相同的代码时也可以有或没有并行性。