简体   繁体   English

如何在Python中找到较低精度浮点值的原始文本表示?

[英]How to find an original text representation for lower precision float values in Python?

I've run into an issue displaying float values in Python, loaded from an external data-source 我遇到了一个问题,在Python中显示float值,从外部数据源加载
(they're 32bit floats, but this would apply to lower precision floats too) . (它们是32位浮子,但这也适用于精度较低的浮子)

(In case its important - These values were typed in by humans in C/C++, so unlike arbitrary calculated values, deviations from round numbers is likely not intended, though can't be ignored since the values may be constants such as M_PI or multiplied by constants). (如果它很重要 - 这些值是由C / C ++中的人类输入的,因此与任意计算值不同,可能无法忽略与数的偏差,但不能忽略,因为值可能是常数,如M_PI或乘以通过常数)。

Since CPython uses higher precision, (64bit typically), a value entered in as a lower precision float may repr() showing precision loss from being a 32bit-float, where the 64bit-float would show round values. 由于CPython使用更高的精度(通常为64位),因此作为较低精度浮点输入的值可能会repr()显示精度损失为32位浮点数,其中64位浮点数将显示舍入值。

eg: 例如:

# Examples of 32bit float's displayed as 64bit floats in CPython.
0.0005 -> 0.0005000000237487257
0.025  -> 0.02500000037252903
0.04   -> 0.03999999910593033
0.05   -> 0.05000000074505806
0.3    -> 0.30000001192092896
0.98   -> 0.9800000190734863
1.2    -> 1.2000000476837158
4096.3 -> 4096.2998046875

Simply rounding the values to some arbitrary precision works in most cases, but may be incorrect since it could loose significant values with eg: 0.00000001 . 在大多数情况下,简单地将值舍入为某些任意精度都可以工作,但可能不正确,因为它可能会丢失重要值,例如: 0.00000001

An example of this can be shown by printing a float converted to a 32bit float. 可以通过打印转换为32位浮点的浮点来显示此示例。

def as_float_32(f):
    from struct import pack, unpack
    return unpack("f", pack("f", f))[0]

print(0.025)               #  --> 0.025
print(as_float_32(0.025))  #  --> 0.02500000037252903

So my question is: 所以我的问题是:

Whats the most efficient & straightforward way to get the original representation for a 32bit float, without making assumptions or loosing precision? 什么是最有效和直接的方式来获得32位浮点的原始表示,而不做假设或失去精度?

Put differently, if I have a data-source containing of 32bit floats, These were originally entered in by a human as round values, (examples above), but having them represented as higher precision values exposes that the value as a 32bit float is an approximation of the original value. 换句话说,如果我有一个包含32位浮点数据的数据源,那么这些数据源最初是由人类作为圆值输入的(上面的示例),但是将它们表示为更高的精度值会暴露出该值为32位浮点数是一个近似的原始值。

I would like to reverse this process, and get the round number back from the 32bit float data, but without loosing the precision which a 32bit float gives us. 我想颠倒这个过程,从32位浮点数据中获取数,但不会失去32位浮点数给我们的精度。 (which is why simply rounding isn't a good option). (这就是为什么简单的舍入不是一个好选择)。


Examples of why you might want to do this: 您可能想要执行此操作的示例:

  • Generating API documentation where Python extracts values from a C-API that uses single precision floats internally. 生成API文档,其中Python从内部使用单精度浮点数的C-API中提取值。
  • When people need to read/review values of data generated which happens to be provided as single precision floats. 当人们需要读取/查看生成的数据值时,这些数据恰好作为单精度浮点数提供。

In both cases it's important not to loose significant precision, or show values which can't be easily read by humans at a glance. 在这两种情况下,重要的是不要失去显着的精确度,或者显示人类一眼就看不到的值。


  • Update, I've made a solution which I'll include as an answer (for reference and to show its possible), but highly doubt its an efficient or elegant solution. 更新,我已经提出了一个解决方案,我将其作为答案(供参考并展示其可能),但高度怀疑它是一个高效或优雅的解决方案。

  • Of course you can't know the notation used: 0.1f , 0.1F or 1e-1f where entered, that's not the purpose of this question. 当然你不能知道所使用的符号:输入的0.1f0.1F1e-1f ,这不是这个问题的目的。

You're looking to solve essentially the same problem that Python's repr solves, namely, finding the shortest decimal string that rounds to a given float. 您正在寻找解决Python的repr解决的基本相同的问题,即找到repr到给定float的最短十进制字符串。 Except that in your case, the float isn't an IEEE 754 binary64 ("double precision") float, but an IEEE 754 binary32 ("single precision") float. 除了在您的情况下,浮点数不是IEEE 754二进制64(“双精度”)浮点数,而是IEEE 754二进制32(“单精度”)浮点数。

Just for the record, I should of course point out that retrieving the original string representation is impossible, since for example the strings '0.10' , '0.1' , '1e-1' and '10e-2' all get converted to the same float (or in this case float32 ). 仅仅为了记录,我当然应该指出,检索原始字符串表示是不可能的,因为例如字符串'0.10''0.1''1e-1''10e-2'都被转换为相同的float(或者在本例中为float32 )。 But under suitable conditions we can still hope to produce a string that has the same decimal value as the original string, and that's what I'll do below. 但是在合适的条件下,我们仍然可以希望生成一个与原始字符串具有相同十进制值的字符串,这就是我将在下面做的。

The approach you outline in your answer more-or-less works, but it can be streamlined a bit. 您在答案中概述的方法或多或少有效,但可以简化一些。

First, some bounds: when it comes to decimal representations of single-precision floats, there are two magic numbers: 6 and 9 . 首先,一些界限:当涉及单精度浮点数的十进制表示时,有两个幻数: 69 The significance of 6 is that any (not-too-large, not-too-small) decimal numeric string with 6 or fewer significant decimal digits will round-trip correctly through a single-precision IEEE 754 float: that is, converting that string to the nearest float32 , and then converting that value back to the nearest 6 -digit decimal string, will produce a string with the same value as the original. 6的重要性是任何(不太大,不太小)具有6个或更少有效十进制数字的十进制数字串将通过单精度IEEE 754浮点数正确往返:即转换该字符串到最近的float32 ,然后将该值转换回最近的6位十进制字符串,将产生一个与原始值相同的字符串。 For example: 例如:

>>> x = "634278e13"
>>> y = float(np.float32(x))
>>> y
6.342780214942106e+18
>>> "{:.6g}".format(y)
'6.34278e+18'

(Here, by "not-too-large, not-too-small" I just mean that the underflow and overflow ranges of float32 should be avoided. The property above applies for all normal values.) (这里,“不太大,不太小”我只是意味着应该避免float32的下溢和溢出范围。上面的属性适用于所有正常值。)

This means that for your problem, if the original string had 6 or fewer digits, we can recover it by simply formatting the value to 6 significant digits. 这意味着对于您的问题,如果原始字符串有6个或更少的数字,我们可以通过简单地将值格式化为6位有效数字来恢复它。 So if you only care about recovering strings that had 6 or fewer significant decimal digits in the first place, you can stop reading here: a simple '{:.6g}'.format(x) is enough. 因此,如果你只关心恢复首先有6个或更少有效小数位的字符串,你可以在这里停止阅读:一个简单的'{:.6g}'.format(x)就足够了。 If you want to solve the problem more generally, read on. 如果您想更一般地解决问题,请继续阅读。

For roundtripping in the other direction, we have the opposite property: given any single-precision float x , converting that float to a 9-digit decimal string (rounding to nearest, as always), and then converting that string back to a single-precision float, will always exactly recover the value of that float. 对于另一个方向的往返,我们有相反的属性:给定任何单精度浮点数x ,将该浮点数转换为9位十进制字符串(舍入到最接近,一如既往),然后将该字符串转换回单个 -精确浮点数,将始终精确恢复该浮点数的值。

>>> x = np.float32(3.14159265358979)
>>> x
3.1415927
>>> np.float32('{:.9g}'.format(x)) == x
True

The relevance to your problem is there's always at least one 9-digit string that rounds to x , so we never have to look beyond 9 digits. 与您的问题的相关性是, 总是至少有一个9位数的字符串向x舍入,因此我们永远不必超过9位。

Now we can follow the same approach that you used in your answer: first try for a 6-digit string, then a 7-digit, then an 8-digit. 现在我们可以按照您在答案中使用的相同方法:首先尝试6位数字符串,然后是7位数字,然后是8位数字。 If none of those work, the 9-digit string surely will, by the above. 如果这些都不起作用,那么9位数的字符串肯定会由上面的字符串组成。 Here's some code. 这是一些代码。

def original_string(x):
    for places in range(6, 10):  # try 6, 7, 8, 9
        s = '{:.{}g}'.format(x, places)
        y = np.float32(s)
        if x == y:
            return s
    # If x was genuinely a float32, we should never get here.
    raise RuntimeError("We should never get here")

Example outputs: 示例输出:

>>> original_string(0.02500000037252903)
'0.025'
>>> original_string(0.03999999910593033)
'0.04'
>>> original_string(0.05000000074505806)
'0.05'
>>> original_string(0.30000001192092896)
'0.3'
>>> original_string(0.9800000190734863)
'0.98'

However, the above comes with several caveats. 但是,上面提到了一些警告。

  • First, for the key properties we're using to be true, we have to assume that np.float32 always does correct rounding . 首先,对于我们使用的关键属性,我们必须假设np.float32始终进行正确的舍入 That may or may not be the case, depending on the operating system. 这可能是也可能不是,取决于操作系统。 (Even in cases where the relevant operating system calls claim to be correctly rounded, there may still be corner cases where that claim fails to be true.) In practice, it's likely that np.float32 is close enough to correctly rounded not to cause issues, but for complete confidence you'd want to know that it was correctly rounded. (即使在相关操作系统调用声称被正确舍入的情况下,仍可能存在声称无法实现的np.float32情况。)在实践中, np.float32可能足够接近正确舍入而不会导致问题,但为了完全放心,你想要知道它是正确的圆形。

  • Second, the above won't work for values in the subnormal range (so for float32 , anything smaller than 2**-126 ). 其次,上述不适用于低于正常范围的值(因此对于float32 ,任何小于2**-126 )。 In the subnormal range, it's no longer true that a 6-digit decimal numeric string will roundtrip correctly through a single-precision float. 在低于正常范围内,6位十进制数字字符串将通过单精度浮点数正确往返是不正确的。 If you care about subnormals, you'd need to do something more sophisticated there. 如果你关心次正规,你需要在那里做一些更复杂的事情。

  • Third, there's a really subtle (and interesting!) error in the above that almost doesn't matter at all. 第三,上面有一个非常微妙(和有趣!)的错误几乎无关紧要。 The string formatting we're using always rounds x to the nearest places -digit decimal string to the true value of x . 我们使用的字符串格式总是将x舍入到最接近的 places -digit十进制字符串到x的真值。 However, we want to know simply whether there's any places -digit decimal string that rounds back to x . 但是,我们想知道是否有任何 places数字十进制字符串回转到x We're implicitly assuming the (seemingly obvious) fact that if there's any places -digit decimal string that rounds to x , then the closest places -digit decimal string rounds to x . 我们隐含地假设(看似显而易见的)事实,即如果有任何 places数字十进制字符串舍入到x ,则最接近的 places数字十进制字符串舍入到x And that's almost true: it follows from the property that the interval of all real numbers that rounds to x is symmetric around x . 几乎是正确的:从属性得出的是,围绕x的所有实数的区间在x周围是对称的。 But that symmetry property fails in one particular case, namely when x is a power of 2 . 但是这种对称性在一种特定情况下失败,即当x2的幂时。

So when x is an exact power of 2 , it's possible (but fairly unlikely) that (for example) the closest 8-digit decimal string to x doesn't round to x , but nevertheless there is an 8-digit decimal string that does round to x . 因此,当x2的精确幂时,(例如)最接近x的最接近的8位十进制字符串可能 (但不太可能) 不会舍入到x ,但是仍然有一个8位十进制字符串舍入到x You can do an exhaustive search for cases where this happens within the range of a float32 , and it turns out that there are exactly three values of x for which this occurs, namely x = 2**-96 , x = 2**87 and x = 2**90 . 你可以对在float32范围内发生这种情况的情况进行详尽的搜索,结果发现正好有三个x值,即x = 2**-96x = 2**87并且x = 2**90 For 7 digits, there are no such values. 对于7位数字,没有这样的值。 (And for 6 and 9 digits, this can never happen.) Let's take a closer look at the case x = 2**87 : (对于6位和9位数字,这种情况永远不会发生。)让我们仔细看看x = 2**87

>>> x = 2.0**87
>>> x
1.5474250491067253e+26

Let's take the closest 8-digit decimal value to x : 我们将最接近的8位十进制值取为x

>>> s = '{:.8g}'.format(x)
>>> s
'1.547425e+26'

It turns out that this value doesn't round back to x : 事实证明,这个值不会回到x

>>> np.float32(s) == x
False

But the next 8-digit decimal string up from it does: 但是它的下一个8位十进制数字符号确实如下:

>>> np.float32('1.5474251e+26') == x
True

Similarly, here's the case x = 2**-96 : 同样,这是x = 2**-96

>>> x = 2**-96.
>>> x
1.262177448353619e-29
>>> s = '{:.8g}'.format(x)
>>> s
'1.2621774e-29'
>>> np.float32(s) == x
False
>>> np.float32('1.2621775e-29') == x
True

So ignoring subnormals and overflows, out of all 2 billion or so positive normal single-precision values, there are precisely three values x for which the above code doesn't work. 因此,在所有20亿左右的正常单精度值中忽略次正规和溢出,恰好有三个x ,上述代码不起作用。 (Note: I originally thought there was just one; thanks to @RickRegan for pointing out the error in comments.) So here's our (slightly tongue-in-cheek) fixed code: (注意:我原本以为只有一个;感谢@RickRegan指出评论中的错误。)所以这是我们的(略带舌头)固定代码:

def original_string(x):
    """
    Given a single-precision positive normal value x,
    return the shortest decimal numeric string which produces x.
    """
    # Deal with the three awkward cases.
    if x == 2**-96.:
        return '1.2621775e-29'
    elif x == 2**87:
        return '1.5474251e+26'
    elif x == 2**90:
        return '1.2379401e+27'

    for places in range(6, 10):  # try 6, 7, 8, 9
        s = '{:.{}g}'.format(x, places)
        y = np.float32(s)
        if x == y:
            return s
    # If x was genuinely a float32, we should never get here.
    raise RuntimeError("We should never get here")

I think Decimal.quantize() (to round to a given number of decimal digits) and .normalize() (to strip trailing 0's) is what you need. 我认为Decimal.quantize() (舍入到给定的十进制数字)和.normalize().normalize()尾随0)是你需要的。

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from decimal import Decimal

data = (
    0.02500000037252903,
    0.03999999910593033,
    0.05000000074505806,
    0.30000001192092896,
    0.9800000190734863,
    )

for f in data:
    dec = Decimal(f).quantize(Decimal('1.0000000')).normalize()
    print("Original %s -> %s" % (f, dec))

Result: 结果:

Original 0.0250000003725 -> 0.025
Original 0.0399999991059 -> 0.04
Original 0.0500000007451 -> 0.05
Original 0.300000011921 -> 0.3
Original 0.980000019073 -> 0.98

Heres a solution I've come up with which works ( perfectly as far as I can tell) but isn't efficient. 这是我提出的解决方案(尽管我可以完全理解),但效率不高。

It works by rounding at increasing decimal places, and returning the string when the rounded and non-rounded inputs match (when compared as values converted to lower precision). 它的工作方式是在递增小数位时舍入,并在舍入和非舍入输入匹配时返回字符串(当比较为转换为较低精度的值时)。

Code: 码:

def round_float_32(f):
    from struct import pack, unpack
    return unpack("f", pack("f", f))[0]


def as_float_low_precision_repr(f, round_fn):
    f_round = round_fn(f)
    f_str = repr(f)
    f_str_frac = f_str.partition(".")[2]
    if not f_str_frac:
        return f_str
    for i in range(1, len(f_str_frac)):
        f_test = round(f, i)
        f_test_round = round_fn(f_test)
        if f_test_round == f_round:
            return "%.*f" % (i, f_test)
    return f_str

# ----

data = (
    0.02500000037252903,
    0.03999999910593033,
    0.05000000074505806,
    0.30000001192092896,
    0.9800000190734863,
    1.2000000476837158,
    4096.2998046875,
    )

for f in data:
    f_as_float_32 = as_float_low_precision_repr(f, round_float_32)
    print("%s -> %s" % (f, f_as_float_32))

Outputs: 输出:

0.02500000037252903 -> 0.025
0.03999999910593033 -> 0.04
0.05000000074505806 -> 0.05
0.30000001192092896 -> 0.3
0.9800000190734863 -> 0.98
1.2000000476837158 -> 1.2
4096.2998046875 -> 4096.3

If you have at least NumPy 1.14.0, you can just use repr(numpy.float32(your_value)) . 如果你至少有NumPy 1.14.0,你可以使用repr(numpy.float32(your_value)) Quoting the release notes : 引用发行说明

Float printing now uses “dragon4” algorithm for shortest decimal representation 浮动打印现在使用“dragon4”算法进行最短的十进制表示

The str and repr of floating-point values (16, 32, 64 and 128 bit) are now printed to give the shortest decimal representation which uniquely identifies the value from others of the same type. 现在打印浮点值的str和repr(16,32,64和128位)以给出最短的十进制表示,其唯一地标识来自相同类型的其他值的值。 Previously this was only true for float64 values. 以前这只适用于float64值。 The remaining float types will now often be shorter than in numpy 1.13. 剩余的浮动类型现在通常比numpy 1.13短。

Here's a demo running against a few of your example values: 这是针对您的一些示例值运行的演示:

>>> repr(numpy.float32(0.0005000000237487257))
'0.0005'
>>> repr(numpy.float32(0.02500000037252903))
'0.025'
>>> repr(numpy.float32(0.03999999910593033))
'0.04'

At least in python3 you can use .as_integer_ratio . 至少在python3中你可以使用.as_integer_ratio That's not exactly a string but the floating point definition as such is not really well suited for giving an exact representation in "finite" strings. 这不完全是一个字符串,但浮点定义本身并不适合在“有限”字符串中给出精确表示。

a = 0.1
a.as_integer_ratio()
(3602879701896397, 36028797018963968)

So by saving these two numbers you'll never lose precision because these two exactly represent the saved floating point number. 因此,通过保存这两个数字,您将永远不会失去精度,因为这两个数字完全代表保存的浮点数。 (Just divide the first by the second to get the value). (只需将第一个除以第二个即可得到值)。


As an example using numpy dtypes (very similar to c dtypes): 作为使用numpy dtypes的示例(非常类似于c dtypes):

# A value in python floating point precision
a = 0.1
# The value as ratio of integers
b = a.as_integer_ratio()

import numpy as np
# Force the result to have some precision:
res = np.array([0], dtype=np.float16)
np.true_divide(b[0], b[1], res)
print(res)
# Compare that two the wanted result when inputting 0.01
np.true_divide(1, 10, res)
print(res)

# Other precisions:
res = np.array([0], dtype=np.float32)
np.true_divide(b[0], b[1], res)
print(res)
res = np.array([0], dtype=np.float64)
np.true_divide(b[0], b[1], res)
print(res)

The result of all these calculations is: 所有这些计算的结果是:

[ 0.09997559] # Float16 with integer-ratio
[ 0.09997559] # Float16 reference
[ 0.1] # Float32
[ 0.1] # Float64

Probably what you are looking for is decimal : 可能你要找的是decimal

Decimal “is based on a floating-point model which was designed with people in mind, and necessarily has a paramount guiding principle – computers must provide an arithmetic that works in the same way as the arithmetic that people learn at school.” Decimal“基于一个浮点模型,它是为人们设计的,并且必然具有最重要的指导原则 - 计算机必须提供与人们在学校学习的算法相同的算法。”

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何格式化 pandas dataframe 并保持原始浮点精度值 - How to format a pandas dataframe and keep original float precision values 如何将NetCDF变量浮点数据读入Numpy数组,其精度和比例与原始NetCDF浮点值相同? - How to read NetCDF variable float data into a Numpy array with the same precision and scale as the original NetCDF float values? 如何将字符串拆分为字符并用浮点值替换字符以找到 Python 中原始字符串的总和? - How do I split string into characters and replace characters with float values to find the sum of original string in Python? Python 中小浮点值的二进制表示 - Binary representation of small float values in Python Python float精度浮点数 - Python float precision float 如何使用python获得具有所需精度的浮点数 - how to get a float with desired precision using python 如何以一定的精度处理python浮点数 - how to deal with python float number with certain precision Python:如何在句子的单词列表中找到一个字母并以原始大小写返回这些单词(大写/小写) - Python: How to find a letter in a sentence's list of words and return those words in their original case (upper/lower) 如何舍入到 Python 中最近的较低浮点数? - How to round to the nearest lower float in Python? 单精度大端浮点值到Python的浮点数(双精度,大端) - Single precision big endian float values to Python's float (double precision, big endian)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM