如何在c ++中编写可移植的浮点运算？

Question

Say you're writing a C++ application doing lots of floating point arithmetic. 假设您正在编写一个执行大量浮点运算的C ++应用程序。 Say this application needs to be portable accross a reasonable range of hardware and OS platforms (say 32 and 64 bits hardware, Windows and Linux both in 32 and 64 bits flavors...). 假设这个应用程序需要在合理范围的硬件和操作系统平台上可移植（例如32位和64位硬件，Windows和Linux都是32位和64位版本......）。

How would you make sure that your floating point arithmetic is the same on all platforms ? 您如何确保所有平台上的浮点算法都相同？ For instance, how to be sure that a 32 bits floating point value will really be 32 bits on all platforms ? 例如，如何确保所有平台上的32位浮点值真的是32位？

For integers we have stdint.h but there doesn't seem to exist a floating point equivalent. 对于整数，我们有stdint.h但似乎没有浮点等价物。

[EDIT] [编辑]

I got very interesting answers but I'd like to add some precision to the question. 我得到了非常有趣的答案，但我想为这个问题增加一些精确度。

For integers, I can write: 对于整数，我可以写：

#include <stdint>
[...]
int32_t myInt;

and be sure that whatever the (C99 compatible) platform I'm on, myInt is a 32 bits integer. 并确保无论我在哪个（C99兼容）平台上，myInt都是32位整数。

If I write: 如果我写：

double myDouble;
float myFloat;

am I certain that this will compile to, respectively, 64 bits and 32 bits floating point numbers on all platforms ? 我确定这会在所有平台上分别编译为64位和32位浮点数吗？

Answer 1

Non-IEEE 754 非IEEE 754

Generally, you cannot. 一般来说，你不能。 There's always a trade-off between consistency and performance, and C++ hands that to you. 在一致性和性能之间总是需要权衡，而C ++会为您提供帮助。

For platforms that don't have floating point operations (like embedded and signal processing processors), you cannot use C++ "native" floating point operations, at least not portably so. 对于没有浮点运算的平台（如嵌入式和信号处理处理器），您不能使用C ++“本机”浮点运算，至少不能这样。 While a software layer would be possible, that's certainly not feasible for this type of devices. 虽然软件层是可能的，但对于这种类型的设备来说肯定是不可行的。

For these, you could use 16 bit or 32 bit fixed point arithmetic (but you might even discover that long is supported only rudimentary - and frequently, div is very expensive). 对于这些，您可以使用16位或32位定点算法（但您甚至可能会发现只支持基本的长 - 而且通常，div非常昂贵）。 However, this will be much slower than built-in fixed-point arithmetic, and becomes painful after the basic four operations. 然而，这将比内置的定点算术慢得多，并且在基本的四个操作之后变得很痛苦。

I haven't come across devices that support floating point in a different format than IEEE 754 . 我没有遇到过支持不同于IEEE 754格式的浮点的设备。 From my experience, your best bet is to hope for the standard, because otherwise you usually end up building algorithms and code around the capabilities of the device. 根据我的经验，您最好的选择是希望达到标准，否则您通常最终会围绕设备的功能构建算法和代码。 When sin(x) suddenly costs 1000 times as much, you better pick an algorithm that doesn't need it. 当sin(x)突然花费1000倍时，你最好选择一个不需要它的算法。

IEEE 754 - Consistency IEEE 754 - 一致性

The only non-portability I found here is when you expect bit-identical results across platforms. 我在这里发现的唯一不可移植性是当你期望跨平台的比特结果相同时。 The biggest influence is the optimizer. 最大的影响是优化者。 Again, you can trade accuracy and speed for consistency. 同样，您可以交换准确性和速度以保持一致性。 Most compilers have a option for that - eg "floating point consistency" in Visual C++. 大多数编译器都有一个选项 - 例如Visual C ++中的“浮点一致性”。 But note that this is always accuracy beyond the guarantees of the standard. 但请注意，这总是超出标准保证的准确性。

Why results become inconsistent? 为什么结果不一致？ First, FPU registers often have higher resolution than double's (eg 80 bit), so as long as the code generator doesn't store the value back, intermediate values are held with higher accuracy. 首先，FPU寄存器通常具有比double（例如80位）更高的分辨率，因此只要代码生成器不将值存储回来，就可以以更高的精度保持中间值。

Second, the equivalences like a*(b+c) = a*b + a*c are not exact due to the limited precision. 其次，由于精度有限， a*(b+c) = a*b + a*c等等不精确。 Nonetheless the optimizer, if allowed, may make use of them. 尽管如此，优化器（如果允许）可以使用它们。

Also - what I learned the hard way - printing and parsing functions are not necessarily consistent across platforms, probably due to numeric inaccuracies, too. 此外 - 我学到了很多困难 - 打印和解析功能在各个平台上不一定一致，可能也是由于数字不准确。

float 浮动

It is a common misconception that float operations are intrinsically faster than double. 一种常见的误解是，浮动操作本质上比双重更快。 working on large float arrays is faster usually through less cache misses alone. 通过单独减少缓存未命中，处理大型浮点数组的速度更快。

Be careful with float accuracy. 浮动精度要小心。 it can be "good enough" for a long time, but I've often seen it fail faster than expected. 它可以在很长一段时间内“足够好”，但我经常看到它失败的速度比预期的要快。 Float-based FFT's can be much faster due to SIMD support, but generate notable artefacts quite early for audio processing. 由于支持SIMD，基于浮点数的FFT可以更快，但是在音频处理的早期阶段就会产生显着的假象。

Answer 2

Use fixed point. 使用固定点。

However, if you want to approach the realm of possibly making portable floating point operations, you at least need to use controlfp to ensure consistent FPU behavior as well as ensuring that the compiler enforces ANSI conformance with respect to floating point operations. 但是，如果您希望接近可能进行可移植浮点运算的领域，则至少需要使用controlfp来确保一致的FPU行为，并确保编译器对浮点运算强制执行ANSI一致性。 Why ANSI? 为何选择ANSI？ Because it's a standard. 因为它是标准。

And even then you aren't guaranteeing that you can generate identical floating point behavior; 即便如此，您也无法保证可以生成相同的浮点行为; that also depends on the CPU/FPU you are running on. 这也取决于您运行的CPU / FPU。

Answer 3

It shouldn't be an issue, IEEE 754 already defines all details of the layout of floats. 这应该不是问题， IEEE 754已经定义了浮点布局的所有细节。

The maximum and minimum values storable should be defined in limits.h 可存储的最大值和最小值应在limits.h中定义

Answer 4

Portable is one thing, generating consistent results on different platforms is another. 便携式是一回事，在不同平台上产生一致的结果是另一回事。 Depending on what you are trying to do then writing portable code shouldn't be too difficult, but getting consistent results on ANY platform is practically impossible. 根据您的尝试，编写可移植代码应该不会太困难，但在任何平台上获得一致的结果几乎是不可能的。

Answer 5

I believe "limits.h" will include the C library constants INT_MAX and its brethren. 我相信“limits.h”将包括C库常量INT_MAX及其兄弟。 However, it is preferable to use "limits" and the classes it defines: 但是，最好使用“限制”及其定义的类：

std::numeric_limits<float>, std::numeric_limits<double>, std::numberic_limits<int>, etc...

Answer 6

If you're assuming that you will get the same results on another system, read What could cause a deterministic process to generate floating point errors first. 如果您假设您将在另一个系统上获得相同的结果，请阅读什么可能导致确定性过程首先生成浮点错误。 You might be surprised to learn that your floating point arithmetic isn't even the same across different runs on the very same machine! 您可能会惊讶地发现，在同一台机器上的不同运行中，您的浮点运算甚至不同！

如何在c ++中编写可移植的浮点运算？

问题描述

6 个解决方案

解决方案1
9 已采纳 2009-06-11 18:19:19

Non-IEEE 754 非IEEE 754

IEEE 754 - Consistency IEEE 754 - 一致性

float 浮动

解决方案2
4 2009-06-11 17:39:57

解决方案3
2 2009-06-11 17:29:07

解决方案4
2 2009-06-11 17:47:58

解决方案5
0 2009-06-11 17:36:08

解决方案6
0 2009-06-12 09:25:03

如何在c ++中编写可移植的浮点运算？

问题描述

6 个解决方案

解决方案1 9 已采纳 2009-06-11 18:19:19

Non-IEEE 754 非IEEE 754

IEEE 754 - Consistency IEEE 754 - 一致性

float 浮动

解决方案2 4 2009-06-11 17:39:57

解决方案3 2 2009-06-11 17:29:07

解决方案4 2 2009-06-11 17:47:58

解决方案5 0 2009-06-11 17:36:08

解决方案6 0 2009-06-12 09:25:03

解决方案1
9 已采纳 2009-06-11 18:19:19

解决方案2
4 2009-06-11 17:39:57

解决方案3
2 2009-06-11 17:29:07

解决方案4
2 2009-06-11 17:47:58

解决方案5
0 2009-06-11 17:36:08

解决方案6
0 2009-06-12 09:25:03