简体   繁体   English

Python / numpy浮点文本精度

[英]Python/numpy floating-point text precision

Let's say I have some 32-bit and 64-bit floating point values: 假设我有一些32位和64位浮点值:

>>> import numpy as np
>>> v32 = np.array([5, 0.1, 2.4, 4.555555555555555, 12345678.92345678635], 
                   dtype=np.float32)
>>> v64 = np.array([5, 0.1, 2.4, 4.555555555555555, 12345678.92345678635], 
                   dtype=np.float64)

I want to serialize these values to text without losing precision (or at least really close to not losing precision). 我想将这些值序列化为文本而不会丢失精度(或者至少非常接近于不会丢失精度)。 I think the canonical way of doing this is with repr : 我认为这样做的规范方式是使用repr

>>> map(repr, v32)
['5.0', '0.1', '2.4000001', '4.5555553', '12345679.0']
>>> map(repr, v64)
['5.0', '0.10000000000000001', '2.3999999999999999', '4.5555555555555554', 
 '12345678.923456786']

But I want to make the representation as compact as possible to minimize file size, so it would be nice if values like 2.4 got serialized without the extra decimals. 但是我希望尽可能简化表示以最小化文件大小,所以如果像2.4这样的值被序列化而没有额外的小数就会很好。 Yes, I know that's their actual floating point representation, but %g seems to be able to take care of this: 是的,我知道这是他们的实际浮点表示,但%g似乎能够解决这个问题:

>>> ('%.7g ' * len(v32)) % tuple(v32)
'5 0.1 2.4 4.555555 1.234568e+07 '
>>> ('%.16g ' * len(v32)) % tuple(v64)
'5 0.1 2.4 4.555555555555555 12345678.92345679 '

My question is: is it safe to use %g in this way? 我的问题是:以这种方式使用%g是否安全? Are .7 and .16 the correct values so that precision won't be lost? .7.16是否正确,以便精度不会丢失?

Python 2.7 and later already have a smart repr implementation for floats that prints 0.1 as 0.1 . Python 2.7及更高版本已经为浮点数设置了智能repr实现,将0.1打印为0.1 The brief output is chosen in preference to other candidates such as 0.10000000000000001 because it is the shortest representation of that particular number that roundtrips to the exact same floating-point value when read back into Python. 简要输出的选择优先于其他候选项,例如0.10000000000000001因为它是回读到Python时往返于完全相同浮点值的特定数字的最短表示。 To use this algorithm, convert your 64-bit floats to actual Python floats before handing them off to repr : 要使用此算法,请将64位浮点数转换为实际的Python浮点数,然后再将它们移交给repr

>>> map(repr, map(float, v64))
['5.0', '0.1', '2.4', '4.555555555555555', '12345678.923456786']

Surprisingly, the result is natural-looking and numerically correct. 令人惊讶的是,结果是自然的数字上正确的。 More info on the 2.7/3.2 repr can be found in What's New and a fascinating lecture by Mark Dickinson. 关于2.7 / 3.2 repr更多信息可以在What's New和Mark Dickinson 的精彩演讲中找到。

Unfortunately, this trick won't work for 32-bit floats, at least not without reimplementing the algorithm used by Python 2.7's repr . 不幸的是,这个技巧不适用于32位浮点数,至少没有重新实现Python 2.7的repr所使用的算法。

To uniquely determine a single-precision (32-bit) floating point number in IEEE-754 format, it can be necessary to use 9 (significant, ie not starting with 0, unless the value is 0) decimal digits, and 9 digits are always sufficient. 要唯一地确定IEEE-754格式的单精度(32位)浮点数,可能需要使用9(有效,即不以0开头,除非值为0)十进制数字,并且9位数是总是足够的。

For double-precision (64-bit) floating point numbers, 17 (significant) decimal digits may be necessary and are always sufficient. 对于双精度(64位)浮点数,可能需要17(有效)十进制数字,并且始终足够。

I'm not quite sure how the %g format is specified, by the looks of it, it can let the representation begin with a 0 (0.1), so the safe values for the precision would be .9 and .17 . 我不太确定如何指定%g格式,通过它的外观,它可以让表示以0(0.1)开头,因此精度的安全值将是.9.17

If you want to minimise the file size, writing the byte representations would produce a much smaller file, so if you can do that, that's the way to go. 如果你想最小化文件大小,写字节表示会产生一个小得多的文件,所以如果你能做到这一点,那就是你要走的路。

The C code that implements the fancy repr in 2.7 is mostly in Python/dtoa.c (with wrappers in Python/pystrtod.c and Objects/floatobject.c). 在2.7中实现fancy repr的C代码主要在Python / dtoa.c中(在Python / pystrtod.c和Objects / floatobject.c中使用包装器)。 In particular, look at _Py_dg_dtoa. 特别要看_Py_dg_dtoa。 It should be possible to borrow this code and modify it to work with float instead of double. 应该可以借用这段代码并修改它以使用float而不是double。 Then you could wrap this up in an extension module, or just build it as an so and ctypes it. 然后你可以将它包装在一个扩展模块中,或者只是将它构建为一个,然后对它进行ctypes。

Also, note that the source says the implementation is "Inspired by "How to Print Floating-Point Numbers Accurately" by Guy L. Steele, Jr. and Jon L. White [Proc. ACM SIGPLAN '90, pp. 112-126]." 此外,请注意该消息来源称该实施是“灵感来自如何准确打印浮点数”作者:Guy L. Steele,Jr。和Jon L. White [Proc.ACM SIGPLAN '90,pp.112-126] “。 So, you might be able to implement something less flexible and simpler yourself by reading that paper (and whichever of the modfications documented in the dtoa.c comments seem appropriate). 因此,您可以通过阅读该论文(以及dtoa.c注释中记录的任何修改似乎合适)来实现不那么灵活和简单的东西。

Finally, the code is a minor change to code posted by David Gay at AT&T, and used in a number of other libraries (NSPR, etc.), one of which might have a more accessible version. 最后,代码是David Gay在AT&T发布的代码的一个小改动,并用于许多其他库(NSPR等),其中一个可能具有更易于访问的版本。

But before doing any of that, make sure there really is a performance issue by trying a Python function and measuring whether it's too slow. 但在执行任何操作之前,请确保通过尝试Python函数并测量它是否太慢来确实存在性能问题。

And if this really is a performance-critical area, you probably don't want to loop over the list and call repr (or your own fancy C function) in the first place; 如果这确实是一个性能关键领域,你可能不想在列表上循环并首先调用repr(或你自己喜欢的C函数); you probably want a function that converts a numpy array of floats or doubles to a string all at once. 你可能想要一个函数将一个浮点数或双精度数的numpy数组一次性转换为一个字符串。 (Ideally you'd want to build that into numpy, of course.) (理想情况下,你当然希望把它变成numpy。)

One last thought: you're looking for "at least really close to not losing precision". 最后一个想法:你正在寻找“至少真的接近不失精度”。 It's conceivable that just converting to double and using the repr is close enough for your purposes, and it's obviously much easier than anything else, so you should at least test it to rule it out. 可以想象,只需转换为double并使用repr就足够接近你的目的,而且它显然比其他任何东西都容易得多,所以你至少应该测试它以排除它。

Needless to say, you should also test whether %.9g and %.17g are close enough for your purposes, since that's the next easiest thing that could possibly work. 毋庸置疑,您还应该测试%.9g%.17g是否足够接近您的目的,因为这是下一个可能有效的最简单的事情。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM