简体   繁体   English

是否有可能强制浮点数的指数或有效数与另一个浮点数(Python)相匹配?

[英]Is it possible to force exponent or significand of a float to match another float (Python)?

This is an interesting question that I was trying to work through the other day. 这是一个有趣的问题,我试图在前几天工作。 Is it possible to force the significand or exponent of one float to be the same as another float in Python? 是否可以强制一个float的有效数或指数与Python中的另一个float相同?

The question arises because I was trying to rescale some data so that the min and max match another data set. 问题出现了,因为我试图重新缩放某些数据,以便min和max匹配另一个数据集。 However, my rescaled data was slightly off (after about 6 decimal places) and it was enough to cause problems down the line. 但是,我重新调整后的数据略有偏差(大约小数点后6位),这足以引起问题。

To give an idea, I have f1 and f2 ( type(f1) == type(f2) == numpy.ndarray ). 为了给出一个想法,我有f1f2type(f1) == type(f2) == numpy.ndarray )。 I want np.max(f1) == np.max(f2) and np.min(f1) == np.min(f2) . 我想要np.max(f1) == np.max(f2) and np.min(f1) == np.min(f2) To achieve this, I do: 为此,我做到了:

import numpy as np

f2 = (f2-np.min(f2))/(np.max(f2)-np.min(f2)) # f2 is now between 0.0 and 1.0
f2 = f2*(np.max(f1)-np.min(f1)) + np.min(f1)  # f2 is now between min(f1) and max(f1)

The result (just as an example) would be: 结果(仅作为示例)将是:

np.max(f1) # 5.0230593
np.max(f2) # 5.0230602 but I need 5.0230593 

My initial thought is that forcing the exponent of the float would be the correct solution. 我最初的想法是强制float的指数将是正确的解决方案。 I couldn't find much on it, so I made a workaround for my need: 我找不到多少,所以我为我的需要做了一个解决方法

exp = 0
mm = np.max(f1)

# find where the decimal is
while int(10**exp*mm) == 0
  exp += 1

# add 4 digits of precision
exp += 4

scale = 10**exp

f2 = np.round(f2*scale)/scale
f1 = np.round(f1*scale)/scale

now np.max(f2) == np.max(f1) 现在np.max(f2) == np.max(f1)

However, is there a better way? 但是,还有更好的方法吗? Did I do something wrong? 我做错什么了吗? Is it possible to reshape a float to be similar to another float (exponent or other means)? 是否有可能重塑float以类似于另一个float (指数或其他方式)?

EDIT: as was suggested, I am now using: 编辑:按照建议,我现在使用:

scale = 10**(-np.floor(np.log10(np.max(f1))) + 4)

While my solution above will work (for my application), I'm interested to know if there's a solution that can somehow force the float to have the same exponent and/or significand so that the numbers will become identical. 虽然我的上述解决方案可行(对于我的应用程序),但我很想知道是否有一个解决方案可以某种方式强制float具有相同的指数和/或有效数字,以便数字变得相同。

It depends what you mean by "mantissa." 这取决于你所说的“尾数”。

Internally, floats are stored using scientific notation in base 2. So if you mean the base 2 mantissa, it is actually very easy: Just multiply or divide by powers of two (not powers of 10), and the mantissa will stay the same (provided the exponent doesn't get out of range; if it does, you'll get clamped to infinity or zero, or possibly go into denormal numbers depending on architectural details). 在内部,浮点数使用科学记数法存储在基数2中。因此,如果你的意思是基数2尾数,它实际上非常容易:只乘以或除以2的幂(不是10的幂),并且尾数将保持不变(如果指数没有超出范围;如果确实如此,你将被钳制到无穷大或零,或者可能根据建筑细节进入非正规数字 )。 It's important to understand that the decimal expansions will not match up when you rescale on powers of two. 重要的是要了解当您重新调整2的幂时,小数扩展将不匹配。 It's the binary expansion that's preserved with this method. 这是使用此方法保留的二进制扩展。

But if you mean the base 10 mantissa, no, it's not possible with floats, because the rescaled value may not be exactly representable. 但是,如果你的意思是基数为10的尾数,不是,浮点数是不可能的,因为重新调整的值可能无法准确表示。 For example, 1.1 cannot be represented exactly in base 2 (with a finite number of digits) in much the same way as 1/3 cannot be represented in base 10 (with a finite number of digits). 例如,1.1不能在基数2(具有有限数字的位数)中精确表示,其方式与1/3不能在基数10中表示(具有有限的数字位数)。 So rescaling 11 down by 1/10 cannot be done perfectly accurately: 因此,将1/10重新缩小1/10不能完全准确地完成:

>>> print("%1.29f" % (11 * 0.1))
1.10000000000000008881784197001

You can, however, do the latter with decimal s. 但是,您可以使用decimal执行后者。 Decimals work in base 10, and will behave as expected in terms of base 10 rescaling. 小数在基数10中起作用,并且在基数10重新缩放方面将按预期运行。 They also provide a fairly large amount of specialized functionality to detect and handle various kinds of loss of precision. 它们还提供了相当多的专用功能来检测和处理各种精度损失。 But decimals don't benefit from NumPy speedups , so if you have a very large volume of data to work with, they may not be efficient enough for your use case. 但小数不会从NumPy加速中受益 ,所以如果你有大量的数据可供使用,它们可能对你的用例来说效率不高。 Since NumPy depends on hardware support for floating point, and most (all?) modern architectures provide no hardware support for base 10, this is not easily remedied. 由于NumPy依赖于对浮点的硬件支持,并且大多数(全部?)现代架构不为基础10提供硬件支持,因此这不容易解决。

Try replacing the second line by 尝试替换第二行

f2 = f2*np.max(f1) + (1.0-f2)*np.min(f1)

Explanation: There are 2 places where the difference could creep in: 说明:有两个地方差异可能蔓延:

Step 1) f2 = (f2-np.min(f2))/(np.max(f2)-np.min(f2)) 步骤1) f2 = (f2-np.min(f2))/(np.max(f2)-np.min(f2))

When you inspect np.min(f2) and np.max(f2) , do you get exactly 0 and 1 or something like 1.0000003? 当你检查np.min(f2)np.max(f2) ,你得到的是0和1或类似于1.0000003的东西吗?

Step 2) f2 = f2*(np.max(f1)-np.min(f1)) + np.min(f1) 步骤2) f2 = f2*(np.max(f1)-np.min(f1)) + np.min(f1)

The expression like (ab)+b doesn't always produce exactly a , due to rounding error. 由于舍入误差,像(ab)+b这样的表达式并不总是精确地产生a The suggested expression is slightly more stable. 建议的表达稍微稳定一些。

For a very detailed explanation, please see What Every Computer Scientist Should Know About Floating-Point Arithmetic by David Goldberg. 有关详细解释,请参阅David Goldberg 所有计算机科学家应该知道的关于浮点运算的内容。

TL;DR TL; DR

Use 使用

f2 = f2*np.max(f1)-np.min(f1)*(f2-1)  # f2 is now between min(f1) and max(f1)

and make sure you're using double precision, compare floating point numbers by looking at absolute or relative differences, avoid rounding for adjusting (or comparing) floating point numbers, and don't set the underlying components of floating point numbers manually. 并确保使用双精度,通过查看绝对或相对差异来比较浮点数,避免舍入以调整(或比较)浮点数,并且不要手动设置浮点数的基础组件。

Details 细节

This isn't a very easy error to reproduce, as you have discovered. 正如您所发现的,这不是一个非常容易重现的错误。 However, working with floating numbers is subject to error. 但是,使用浮动数字可能会出错。 Eg, adding together 1 000 000 000 + 0 . 000 000 000 1 例如,加上1 000 000 000 + 0 . 000 000 000 1 1 000 000 000 + 0 . 000 000 000 1 gives 1 000 000 000 . 000 000 000 1 1 000 000 000 + 0 . 000 000 000 1给予1 000 000 000 . 000 000 000 1 1 000 000 000 . 000 000 000 1 , but this is too many significant figures even for double precision (which supports around 15 significant figures ), so the trailing decimal is dropped. 1 000 000 000 . 000 000 000 1 ,但即使是双精度(支持大约15个有效数字 ),这也是太多重要数字 ,因此后续小数被删除。 Moreover, some "short" numbers can't be represented exactly, as noted in @Kevin's answer . 此外,正如@ Kevin的答案中所述 ,一些“短”数字无法准确表示。 See, eg, here , for more. 例如,请参见此处了解更多信息。 (Search for something like "floating point truncation roundoff error" for even more.) (搜索更多类似“浮点截断舍入错误”的内容。)

Here's an example which does demonstrate a problem: 这是一个展示问题的例子:

import numpy as np

numpy.set_printoptions(precision=16)

dtype=np.float32                     
f1 = np.linspace(-1000, 0.001, 3, dtype=dtype)
f2 = np.linspace(0, 1, 3, dtype=dtype)

f2 = (f2-np.min(f2))/(np.max(f2)-np.min(f2)) # f2 is now between 0.0 and 1.0
f2 = f2*(np.max(f1)-np.min(f1)) + np.min(f1)  # f2 is now between min(f1) and max(f1)

print (f1)
print (f2)

output 产量

[ -1.0000000000000000e+03  -4.9999951171875000e+02   1.0000000474974513e-03]
[ -1.0000000000000000e+03  -4.9999951171875000e+02   9.7656250000000000e-04]

Following @Mark Dickinson's comment , I have used 32 bit floating point. 根据@Mark Dickinson的评论 ,我使用了32位浮点数。 This is consistent with the error you reported, a relative error of around 10^-7, around the 7th significant figure 这与您报告的错误一致,相对误差约为10 ^ -7,大约在第7位有效数字附近

In: (5.0230602 - 5.0230593) / 5.0230593
Out: 1.791736760621852e-07

Going to dtype=np.float64 makes things better but it still isn't perfect. 转到dtype=np.float64可以让事情变得更好,但它仍然不完美。 The program above then gives 然后上面的程序给出了

[ -1.0000000000000000e+03  -4.9999950000000001e+02   1.0000000000000000e-03]
[ -1.0000000000000000e+03  -4.9999950000000001e+02   9.9999999997635314e-04]

This isn't perfect, but is generally close enough. 这并不完美,但通常足够接近。 When comparing floating point numbers you almost never want to use strict equality because of the possibility of small errors as noted above. 在比较浮点数时,您几乎从不想使用严格相等,因为如上所述可能存在小错误。 Instead subtract one number from the other and check the absolute difference is less than some tolerance, and/or look at the relative error. 而是从另一个中减去一个数字并检查绝对差值是否小于某个容差,和/或查看相对误差。 See, eg, numpy.isclose . 参见,例如, numpy.isclose

Returning to your problem, it seems like it should be possible to do better. 回到你的问题,似乎应该可以做得更好。 After all, f2 has the range 0 to 1, so you should be able to replicate the maximum in f1 . 毕竟, f2的范围是0到1,所以你应该能够在f1复制最大值。 The problem comes in the line 问题就在于此

f2 = f2*(np.max(f1)-np.min(f1)) + np.min(f1)  # f2 is now between min(f1) and max(f1)

because when an element of f2 is 1 you're doing a lot more to it than just multiplying 1 by the max of f1 , leading to the possibility of floating point arithmetic errors occurring. 因为当f2的元素为1时,你要做的不仅仅是将1乘以f1的最大值,导致出现浮点算术错误的可能性。 Notice that you can multiply out the brackets f2*(np.max(f1)-np.min(f1)) to f2*np.max(f1) - f2*np.min(f1) , and then factorize the resulting - f2*np.min(f1) + np.min(f1) to np.min(f1)*(f2-1) giving 请注意,您可以将括号f2*(np.max(f1)-np.min(f1)) f2*np.max(f1) - f2*np.min(f1) ,然后对结果进行分解- f2*np.min(f1) + np.min(f1)np.min(f1)*(f2-1)给出

f2 = f2*np.max(f1)-np.min(f1)*(f2-1)  # f2 is now between min(f1) and max(f1)

So when an element of f2 is 1, we have 1*np.max(f1) - np.min(f1)*0 . 因此,当f2的元素为1时,我们得到1*np.max(f1) - np.min(f1)*0 Conversely when an element of f2 is 0, we have 0*np.max(f1) - np.min(f1)*1 . 相反,当f2的元素为0时,我们有0*np.max(f1) - np.min(f1)*1 The numbers 1 and 0 can be exactly represented so there should be no errors. 数字1和0 可以准确表示,因此不应该有错误。

The modified program outputs 修改后的程序输出

[ -1.0000000000000000e+03  -4.9999950000000001e+02   1.0000000000000000e-03]
[ -1.0000000000000000e+03  -4.9999950000000001e+02   1.0000000000000000e-03]

ie as desired. 即根据需要。

Nevertheless I would still strongly recommend only using inexact floating point comparison (with tight bounds if you need) unless you have a very good reason not to do so. 尽管如此,我仍然强烈建议只使用不精确的浮点比较(如果需要,可以使用紧密边界),除非你有充分的理由不这样做。 There are all sorts of subtle errors that can occur in floating point arithmetic and the easiest way to avoid them is never to use exact comparison. 浮点运算中可能会出现各种细微错误,避免它们的最简单方法是永远不要使用精确比较。

An alternative approach to that given above, that might be preferable, would be to rescale both arrays to between 0 and 1. This might be the most suitable form to use within the program. 上面给出的替代方法可能是优选的,将两个数组重新缩放到0和1之间。这可能是在程序中使用的最合适的形式。 (And both arrays could be multiplied by a scaling factor such the original range of f1 , if necessary.) (如果需要,两个数组都可以乘以缩放因子,例如f1的原始范围。)

Re using rounding to solve your problem, I would not recommend this. 重新使用舍入来解决您的问题,我建议这样做。 The problem with rounding -- apart from the fact that it unnecessary reduces the accuracy of your data -- is that numbers that are very close can round in different directions. 四舍五入的问题 - 除了不必要地降低数据准确性这一事实 - 非常接近的数字可以在不同的方向上进行。 Eg 例如

f1 = np.array([1.000049])
f2 = np.array([1.000051])
print (f1)
print (f2)
scale = 10**(-np.floor(np.log10(np.max(f1))) + 4)
f2 = np.round(f2*scale)/scale
f1 = np.round(f1*scale)/scale
print (f1)
print (f2)

Output 产量

[ 1.000049]
[ 1.000051]
[ 1.]
[ 1.0001]

This is related to the fact that although it's common to discuss numbers matching to so many significant figures, people don't actually compare them this way in the computer. 这与以下事实有关:虽然讨论与这么多有效数字相匹配的数字是很常见的,但人们实际上并没有在计算机中这样比较它们。 You calculate the difference and then divide by the correct number (for a relative error). 您计算差异然后除以正确的数字(对于相对误差)。

Re mantissas and exponents, see math.frexp and math.ldexp , documented here . Re math.frexp和exponents,请参阅math.frexpmath.ldexp在此处记录 I would not recommend setting these yourself however (consider two numbers that are very close but have different exponents, for example -- do you really want to set the mantissa). 我不建议你自己设置这些(考虑两个非常接近但具有不同指数的数字,例如 - 你真的想设置尾数)。 Much better to just directly set the maximum of f2 explicitly to the maximum of f1 , if you want to ensure the numbers are exactly the same (and similarly for the minimum). 如果你想确保数字完全相同(并且类似地为最小值),那么直接将f2的最大值明确地设置为f1的最大值要好得多。

def rescale(val, in_min, in_max, out_min, out_max):
    return out_min + (val - in_min) * ((out_max - out_min) / (in_max - in_min))

value_to_rescale = 5
current_scale_min = 0
current_scale_max = 10
target_scale_min = 100
target_scale_max = 200

new_value = rescale(value_to_rescale, current_scale_min, current_scale_max, target_scale_min, target_scale_max)
print(new_value)

new_value = rescale(10, 0, 10, 0, 100)
print(new_value)

answer: 回答:

150 100 150 100

Here is one with decimals 这是一个带小数的

from decimal import Decimal, ROUND_05UP
num1 = Decimal('{:.5f}'.format(5.0230593))  ## Decimal('5.02306')
num2 = Decimal('{}'.format(5.0230602))  ## Decimal('5.0230602')
print num2.quantize(num1, rounding=ROUND_05UP) ## 5.02306

EDIT** i am slightly confused of why I get so much negative feedback, so here is yet another solution not using decimals: 编辑**我有点困惑为什么我得到这么多负面反馈,所以这是另一个不使用小数的解决方案:

a = 5.0230593
b = 5.0230602
if abs(a - b) < 1e-6:
    b = a

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM