简体   繁体   English

如果可以用IEEE 754中的二进制格式表示硬编码是否精确浮动?

[英]Is hardcode float precise if it can be represented by binary format in IEEE 754?

for example, 0 , 0.5, 0.15625 , 1 , 2 , 3... are values converted from IEEE 754. Are their hardcode version precise? 例如,0,0.5,0.15625,1,2,3 ......是从IEEE 754转换而来的值。它们的硬编码版本是否精确?

for example: 例如:

is

float a=0;
if(a==0){
    return true;
}

always return true? 总是回归真实? other example: 其他例子:

float a=0.5;
float b=0.25;
float c=0.125;

is a * b always equal to 0.125 and a * b==c always true? a * b总是等于0.125而a * b == c总是如此? And one more example: 还有一个例子:

int a=123;
float b=0.5;

is a * b always be 61.5? a * b总是61.5? or in general, is integer multiply by IEEE 754 binary float precise? 或者一般来说,整数乘以IEEE 754二进制浮点精确?

Or a more general question: if the value is hardcode and both the value and result can be represented by binary format in IEEE 754 (eg:0.5 - 0.125), is the value precise? 或者更一般的问题:如果值是硬编码并且值和结果都可以用IEEE 754中的二进制格式表示(例如:0.5 - 0.125),那么值是否准确?

There is no inherent fuzzyness in floating-point numbers. 浮点数没有固有的模糊性。 It's just that some, but not all, real numbers can't be exactly represented. 只是有些(但不是全部)实数无法准确表示。

Compare with a fixed-width decimal representation, let's say with three digits. 比较固定宽度的十进制表示,比方说三位数。 The integer 1 can be represented, using 1.00, and 1/10 can be represented, using 0.10, but 1/3 can only be approximated, using 0.33. 可以使用1.00表示整数1,并且可以使用0.10表示1/10,但是只能使用0.33来近似1/3。

If we instead use binary digits, the integer 1 would be represented as 1.00 (binary digits), 1/2 as 0.10, 1/4 as 0.01, but 1/3 can (again) only be approximated. 如果我们改为使用二进制数字,则整数1表示为1.00(二进制数字),1/2表示为0.10,1 / 4表示为0.01,但1/3可以(再次)仅表示近似值。

There are some things to remember, though: 但有些事情需要记住:

  • It's not the same numbers as with decimal digits. 它是不一样的数字与十进制数字。 1/10 can be written exactly as 0.1 using decimal digits, but not using binary digits, no matter how many you use (short of infinity). 1/10可以使用十进制数字精确地写为0.1,但不使用二进制数字,无论您使用多少(无穷大)。
  • In practice, it is difficult to keep track of which numbers can be 实际上,很难跟踪哪些数字可以
    represented and which can't. 代表哪些不可以。 0.5 can, but 0.4 can't. 0.5可以,但0.4不能。 So when you need exact numbers, such as (often) when working with money, you shouldn't use floating-point numbers. 因此,当您需要确切的数字时,例如(通常)在使用金钱时,您不应该使用浮点数。
  • According to some sources, some processors do strange things internally when performing floating-point calculations on numbers that can't be exactly represented, causing results to vary in a way that is, in practice, unpredictable. 根据一些消息来源,一些处理器在对无法准确表示的数字执行浮点计算时会在内部执行奇怪的操作,从而导致结果以实际上不可预测的方式变化。

(My opinion is that it's actually a reasonable first approximation to say that yes, floating-point numbers are inherently fuzzy, so unless you are sure your particular application can handle that, stay away from them.) (我的看法是,它实际上是一个合理的第一次近似地说,是的,浮点数本身模糊的,所以除非你确信你的特定应用程序就可以搞定,远离他们。)

For more details than you probably need or want, read the famous What Every Computer Scientist Should Know About Floating-Point Arithmetic . 有关您可能需要或想要的更多详细信息,请阅读着名的每个计算机科学家应该知道的关于浮点运算的内容 Also, this somewhat more accessible website: The Floating-Point Guide . 此外,这个更易于访问的网站: 浮点指南

No, but as Thomas Padron-McCarthy says, some numbers can be exactly represented using binary but not all of them can. 不,但正如Thomas Padron-McCarthy所说,有些数字可以使用二进制来精确表示,但并非所有数字都可以。

This is the way I explain it to non-developers who I work with (like Mahmut Ali I also work on an very old financial package): Imagine having a very large cake that is cut into 256 slices. 这就是我向与我合作的非开发人员解释的方式(如Mahmut Ali,我也在一个非常古老的金融包上工作):想象一下,有一个非常大的蛋糕切成256片。 Now you can give 1 person the whole cake, 2 people half of the slices but soon as you decide to split it between 3 you can't - it's either 85 or 86 - you can't split the cake any further. 现在你可以给1个人整个蛋糕,2个人一半的切片,但很快你决定将它分成3个你不能 - 它是85或86 - 你不能再分开蛋糕了。 The same is with floating point. 浮点数也一样。 You can only get exact numbers on some representations - some numbers can only be closely approximated. 你只能在某些表示中得到确切的数字 - 有些数字只能近似得到。

C++ does not require binary floating point representation. C ++不需要二进制浮点表示。 Built-in integers are required to have a binary representation, commonly two's complement, but one's complement and sign and magnitude are also supported. 内置整数需要具有二进制表示,通常是二进制补码,但也支持一个补码,符号和幅度。 But floating point can be eg decimal. 但浮点可以是例如十进制。

This leaves open the question of whether C++ floating point can have a radix that does not have 2 as a prime factor, like 2 and 10. Are other radixes permitted? 这就留下了一个问题:C ++浮点数是否可以有一个不具有2作为主要因子的基数,如2和10.是否允许其他基数? I don't know, and last time I tried to check that, I failed. 我不知道,上次我试着检查一下,我失败了。

However, assuming that the radix must be 2 or 10, then all your examples involve values that are powers of 2 and therefore can be exactly represented. 但是, 假设基数必须是2或10,那么所有示例都涉及2的幂的值,因此可以精确表示。

This means that the single answer to most of your questions is “yes”. 这意味着对大多数问题的单一答案是“是”。 The exception is the question “is integer multiply by IEEE 754 binary float [exact]”. 唯一的例外是“整数乘以IEEE 754二进制浮点数[精确]”。 If the result exceeds the precision available, then it can't be exact, but otherwise it is. 如果结果超出了可用的精度,那么它可能不准确,但不是这样。

See the classic “What Every Computer Scientist Should Know About Floating-Point Arithmetic” for background info about floating point representation & properties in general. 有关浮点表示和属性的背景信息,请参阅经典的“每个计算机科学家应该知道的关于浮点算术的内容”


If a value can be exactly represented in 32-bit or 64-bit IEEE 754, then that doesn't mean that it can be exactly represented with some other floating point representation. 如果某个值可以在32位或64位IEEE 754中精确表示,那么这并不意味着它可以用其他浮点表示精确表示。 That's because different 32-bit representations and different 64-bit representations use different number of bits to hold the mantissa and have different exponent ranges. 这是因为不同的32位表示和不同的64位表示使用不同的位数来保持尾数并具有不同的指数范围。 So a number that can be exactly represented in one way, can be beyond the precision or range of some other representation. 因此,可以以一种方式精确表示的数字可以超出某些其他表示的精度或范围。


You can use std::numeric_limits<T>::is_iec559 (where eg T is double ) to check whether your implementation claims to be IEEE 754 compatible. 您可以使用std::numeric_limits<T>::is_iec559 (其中例如Tdouble )来检查您的实现是否声称与IEEE 754兼容。 However, when floating point optimizations are turned on, at least the g++ compiler (1) erroneously claims to be IEEE 754, while not treating eg NaN values correctly according to that standard. 然而,当开启浮点优化时,至少g ++编译器(1)错误地声称是IEEE 754,而不是根据该标准正确地处理例如NaN值。 In effect, the is_iec559 only tells you whether the number representation is IEEE 754, and not whether the semantics conform. 实际上, is_iec559只告诉您数字表示是否为IEEE 754,而不是语义是否符合。


(1) Essentially, instead of providing different types for different semantics, gcc and g++ try to accommodate different semantics via compiler options. (1)本质上,gcc和g ++不是为不同的语义提供不同的类型 ,而是尝试通过编译器选项来适应不同的语义。 And with separate compilation of parts of a program, that can't conform to the C++ standard. 并且对程序的各个部分进行单独编译,这些编译不符合C ++标准。

In principle, this should be possible. 原则上,这应该是可能的。 If you restrict yourself to exactly this class of numbers with a finite 2-power representation. 如果你将自己局限于具有有限2次幂表示的这类数字。

But it is dangerous: what if someone takes your code and changes your 0.5 to 0.4 or your .0625 to .065 due to whatever reasons? 但它是危险的:如果有人有什么需要你的代码,并改变你的0.50.4或您的.0625.065不论何种原因? Then your code is broken. 那你的代码就坏了。 And no, even excessive comments won't help about that - someone will always ignore them. 不,即使是过多的评论也无济于事 - 有人会永远忽略它们。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM