简体   繁体   English

Numpy的float32和float比较

[英]Numpy's float32 and float comparisons

Continuing from Difference between Python float and numpy float32 : 继续Python float和numpy float32之间的区别

import numpy as np

a = 58682.7578125
print(type(a), a)
float_32 = np.float32(a)
print(type(float_32), float_32)
print(float_32 == a)

Prints: 打印:

<class 'float'> 58682.7578125
<class 'numpy.float32'> 58682.8
True

I fully understand that comparing floats for equality is not a good idea but still shouldn't this be False (we're talking about differences in the first decimal digit, not in 0.000000001) ? 我完全明白,将浮点数比较为平等并不是一个好主意但仍然不应该是假的(我们讨论的是第一个十进制数字的差异,而不是0.000000001)? Is it system dependent ? 它是系统依赖的吗? Is this behavior somewhere documented ? 这种行为是否记录在案?

EDIT: Well it's the third decimal: 编辑:嗯,它是第三个小数:

print(repr(float_32), repr(a))
# 58682.758 58682.7578125

but can I trust repr ? 但我可以信任repr吗? How are those stored internally in the final end ? 这些内部存储在最终结局中的方式如何?

EDIT2: people insist that printing float_32 with more precision will give me its representation. EDIT2:人们坚持认为以更高的精度打印float_32会给我它的代表性。 However as I already commented according to nympy's docs : 但是正如我已根据nympy的文档评论过:

the % formatting operator requires its arguments to be converted to standard python types %格式化运算符要求将其参数转换为标准python类型

and: 和:

print(repr(float(float_32)))

prints 版画

58682.7578125 58682.7578125

An interesting insight is given by @MarkDickinson here , apparently repr should be faithful (then he says it's not faithful for np.float32 ). 一个有趣的见解是@MarkDickinson给出这里 ,显然repr应该是忠实的(然后他说,这是不是忠实的np.float32 )。

So let me reiterate my question as follows: 所以让我重申我的问题如下:

  • How can I get at the exact internal representation of float_32 and a in the example ? 如何在示例中获得float_32a确切内部表示 If these are the same, then problem solved if not, 如果这些是相同的,那么问题就解决了,如果没有,
  • What are the exact rules for up/downcasting in a comparison between python's float and np.float32 ? 在python的floatnp.float32之间进行比较时,向上/向下转换的确切规则是什么? I 'd guess that it upcasts float_32 to float although @WillemVanOnsem suggests in the comments it's the other way round 虽然@WillemVanOnsem 在评论中暗示它是另一回事,但我猜它会将float_32向上移动。

My python version: 我的python版本:

Python 3.5.2 (v3.5.2:4def2a2901a5, Jun 25 2016, 22:18:55) [MSC v.1900 64 bit (AMD64)] on win32 win32上的Python 3.5.2(v3.5.2:4def2a2901a5,2016年6月25日,22:18:55)[MSC v.1900 64位(AMD64)]

The numbers compare equal because 58682.7578125 can be exactly represented in both 32 and 64 bit floating point. 数字比较相等,因为58682.7578125可以在32位和64位浮点中精确表示。 Let's take a close look at the binary representation: 让我们仔细看看二进制表示:

32 bit:  01000111011001010011101011000010
sign    :  0
exponent:  10001110
fraction:  11001010011101011000010

64 bit:  0100000011101100101001110101100001000000000000000000000000000000
sign    :  0
exponent:  10000001110
fraction:  1100101001110101100001000000000000000000000000000000

They have the same sign, the same exponent, and the same fraction - the extra bits in the 64 bit representation are filled with zeros. 它们具有相同的符号,相同的指数和相同的分数 - 64位表示中的额外位用零填充。

No matter which way they are cast, they will compare equal. 无论他们演出哪种方式,他们都会比较平等。 If you try a different number such as 58682.757812 4 you will see that the representations differ at the binary level; 如果您尝试使用其他数字,例如58682.757812 4,您将看到二进制级别的表示不同; 32 bit looses more precision and they won't compare equal. 32位失去更高的精度,他们不会比较相等。

(It's also easy to see in the binary representation that a float32 can be upcast to a float64 without any loss of information. That is what numpy is supposed to do before comparing both.) (在二进制表示中也很容易看到float32可以向上传播到float64而不会丢失任何信息。这就是numpy在比较两者之前应该做的事情。)

import numpy as np

a = 58682.7578125
f32 = np.float32(a)
f64 = np.float64(a)

u32 = np.array(a, dtype=np.float32).view(dtype=np.uint32)
u64 = np.array(a, dtype=np.float64).view(dtype=np.uint64)

b32 = bin(u32)[2:]
b32 = '0' * (32-len(b32)) + b32  # add leading 0s
print('32 bit: ', b32)
print('sign    : ', b32[0])
print('exponent: ', b32[1:9])
print('fraction: ', b32[9:])
print()

b64 = bin(u64)[2:]
b64 = '0' * (64-len(b64)) + b64  # add leading 0s
print('64 bit: ', b64)
print('sign    : ', b64[0])
print('exponent: ', b64[1:12])
print('fraction: ', b64[12:])

The same value is stored internally, only it doesn't show all digits with a print 内部存储相同的值,只是它不显示带print所有数字

Try: 尝试:

 print "%0.8f" % float_32

See related Printing numpy.float64 with full precision 请参阅完全精确的相关打印numpy.float64

The decimal 58682.7578125 is the exact fraction ( 7511393/128 ). 小数58682.7578125是精确分数( 7511393/128 )。

The denominator is a power of 2 ( 2**7 ), and the numerator span 23 bits. 分母是2( 2**7 )的幂,分子跨度为23位。 So this decimal value can be represented exactly both in float32 (which has 24 bits significand) and float64. 所以这个十进制值可以在float32(有24位有效位)和float64中完全表示。

Thus the answer of Victor T is correct: in internal representation, it's the same value. 因此,Victor T的答案是正确的:在内部表示中,它是相同的值。

The fact that equality answer true for same value, even for different types is a good thing IMO, what do you expect of (2 == 2.0) ? 对于相同的值,即使对于不同的类型,平等回答是真的这一事实是IMO的一件好事,你期望什么(2 == 2.0)

They're equal. 他们是平等的。 They're just not printing the same because they use different printing logic. 他们只是不打印相同,因为他们使用不同的打印逻辑。

How can I get at the exact internal representation of float_32 and a in the example ? 如何在示例中获得float_32和a的确切内部表示?

Well, that depends on what you mean by "exact internal representation". 那么,这取决于你所说的“确切的内部表征”。 You can get an array of bit values, if you really want one: 如果你真的想要一个比特值,你可以得到一个比特值数组:

>>> b = numpy.float32(a)
>>> numpy.unpackbits(numpy.array([b]).view(numpy.uint8))
array([1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0,
       1, 0, 1, 0, 0, 0, 1, 1, 1], dtype=uint8)

which is as close as you'll get to the "exact internal representation", but it's not exactly the most useful thing to work with. 这与你的“精确内部表示”一样接近,但它并不是最有用的东西。 (Also, the results will be endianness-dependent, because it really is based on the raw internal representation.) (此外,结果将依赖于字节顺序,因为它实际上基于原始内部表示。)

If you want a C-level float, which is how NumPy represents float32 values at C level... well, that's C. Unless you want to write your own C extension module, you can't work with C-level values directly. 如果你想要一个C级浮点数,这就是NumPy在C级表示float32值的方式......好吧,那就是C.除非你想编写自己的C扩展模块,否则你不能直接使用C级值。 The closest you can get is some sort of wrapper around a C float, and hey! 你能得到的最接近的是C浮子周围的某种包装,嘿! You already have one! 你已经有一个! You don't seem happy with it, though, so this isn't really what you want. 但是你似乎并不满意,所以这不是你想要的。

If you want the exact value represented in human-readable decimal, printing it with extra precision using str.format or by converting it to a regular float and then a decimal.Decimal would do that. 如果你想要用人类可读的十进制表示的确切值,使用str.format以额外的精度打印它,或者将它转换为常规的float,然后是decimal.Decimal就可以了。

>>> b
58682.758
>>> decimal.Decimal(float(b))
Decimal('58682.7578125')

The 58682.7578125 value you picked happens to be exactly representable as a float, so the decimal representation coming out happens to be exactly the one you put in, but that won't usually be the case. 您选择的58682.7578125值恰好可以表示为浮点数,因此出现的十进制表示恰好是您放入的值,但通常情况并非如此。 The exact decimal representation you typed in is discarded and unrecoverable. 您键入的确切十进制表示将被丢弃并且不可恢复。

What are the exact rules for up/downcasting in a comparison between python's float and np.float32 ? 在python的float和np.float32之间进行比较时,向上/向下转换的确切规则是什么?

The float32 gets converted to a float64, losslessly. float32无损地转换为float64。

58682.8 58682.8

My machine shows 58682.758 for this line. 我的机器显示58682.758这条线。

I fully understand that comparing floats for equality is not a good idea 我完全理解比较花车的平等并不是一个好主意

It is "not a good idea" if they calculated independently. 如果他们独立计算,那就不是一个好主意。 On the other hand, it is a good idea if you get the same number and check its conversion. 另一方面,如果您获得相同的数字并检查其转换,这一个好主意。

Is it system dependent ? 它是系统依赖的吗? Is this behavior somewhere documented ? 这种行为是否记录在案?

It's fully dependent on conversion to text. 它完全依赖于转换为文本。 According to comments, float32 is essential. 根据评论,float32是必不可少的。 If so, the guaranteed accuracy for float32 is 7 decimal digits, unlike Python's internal float that is float64 (at least on x86). 如果是这样,float32的保证精度是7位十进制数字,不像Python的内部浮点数是float64(至少在x86上)。 That is why the value is truncated in print. 这就是为什么该值在打印中被截断的原因。 The recommended way to print float values in decimal is to stop when output form is that converts back to the same internal value. 以十进制形式打印浮点值的推荐方法是在输出形式转换回相同的内部值时停止。 So it reduces 58682.7578125 to 58682.758: the difference is less than ULP. 所以它将58682.7578125减少到58682.758:差异小于ULP。

The same value printed as internal "float" or numpy float64 will have more significant digits because their omission will result in another internal value: 打印为内部“float”或numpy float64的相同值将具有更多有效数字,因为它们的省略将导致另一个内部值:

>>> 58682.758 == 58682.7578125
False
>>> numpy.float32(58682.758) == numpy.float32(58682.7578125)
True
>>> print(repr(numpy.float32(58682.758).data[0:4]))
'\xc2:eG'
>>> print(repr(numpy.float32(58682.7578125).data[0:4]))
'\xc2:eG'
>>> numpy.float64(58682.758) == numpy.float64(58682.7578125)
False
>>> print(numpy.float64(58682.758).hex(), numpy.float64(58682.7578125).hex())
('0x1.ca7584189374cp+15', '0x1.ca75840000000p+15')

You are lucky these two values are equal in float32 with this concrete value (was this intentional?) but it might be different with other one. 你很幸运这两个值在float32中与这个具体值相同(这是故意吗?)但它可能与其他值不同。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM