通过 BigDecimal 转换为浮点的适当比例

Question

I've written an arbitrary precision rational number class that needs to provide a way to convert to floating-point.我写了一个任意精度的有理数 class ，它需要提供一种转换为浮点的方法。 This can be done straightforwardly via BigDecimal:这可以通过 BigDecimal 直接完成：

return new BigDecimal(num).divide(new BigDecimal(den), 17, RoundingMode.HALF_EVEN).doubleValue();

but this requires a value for the scale parameter when dividing the decimal numbers.但这需要在除十进制数时为 scale 参数设置一个值。 I picked 17 as the initial guess because that is approximately the precision of a double precision floating point number, but I don't know whether that's actually correct.我选择 17 作为最初的猜测，因为这大约是双精度浮点数的精度，但我不知道这是否真的正确。

What would be the correct number to use, defined as, the smallest number such that making it any larger would not make the answer any more accurate?什么是正确的数字，定义为最小的数字，使得它变得更大不会使答案更准确？

Answer 1

Introduction介绍

No finite precision suffices.没有有限的精度就足够了。

The problem posed in the question is equivalent to:问题中提出的问题相当于：

What precision p guarantees that converting any rational number x to p decimal digits and then to floating-point yields the floating-point number nearest x (or, in case of a tie, either of the two nearest x )?什么精度p保证将任何有理数x转换为p十进制数字，然后再转换为浮点数会产生最接近x的浮点数（或者，在平局的情况下，两个最接近的x之一）？

To see this is equivalent, observe that the BigDecimal divide shown in the question returns num / div to a selected number of decimal places.要查看这是等效的，请注意问题中显示的BigDecimal除法将num / div返回到选定的小数位数。 The question then asks whether increasing that number of decimal places could increase the accuracy of the result.然后问题询问增加小数位数是否可以提高结果的准确性。 Clearly, if there is a floating-point number nearer x than the result, then the accuracy could be improved.显然，如果有一个浮点数比结果更接近x ，那么精度可以提高。 Thus, we are asking how many decimal places are needed to guarantee the closest floating-point number (or one of the tied two) is obtained.因此，我们询问需要多少小数位才能保证获得最接近的浮点数（或并列的两个浮点数之一）。

Since BigDecimal offers a choice of rounding methods, I will consider whether any of them suffices.由于BigDecimal提供了舍入方法的选择，我将考虑其中任何一个是否足够。 For the conversion to floating-point, I presume round-to-nearest-ties-to-even is used (which BigDecimal appears to use when converting to Double or Float ).对于浮点数的转换，我假设使用了 round-to-nearest-ties-to-even （ BigDecimal在转换为Double或Float时似乎使用）。 I give a proof using the IEEE-754 binary64 format, which Java uses for Double , but the proof applies to any binary floating-point format by changing the 2 ⁵² used below to 2 ^{w -1} , where w is the number of bits in the significand.我使用 IEEE-754 binary64 格式提供证明，Java 用于Double ，但证明适用于任何二进制浮点格式，方法是将下面使用的 2 ⁵²更改为 2 ^{w -1} ，其中w是有效数字。

Proof证明

One of the parameters to a BigDecimal division is the rounding method. BigDecimal除法的参数之一是舍入方法。 Java's BigDecimal has several rounding methods . Java 的BigDecimal有几种舍入方法。 We only need to consider three, ROUND_UP, ROUND_HALF_UP, and ROUND_HALF_EVEN.我们只需要考虑三个，ROUND_UP、ROUND_HALF_UP和ROUND_HALF_EVEN。 Arguments for the others are analogous to those below, by using various symmetries.其他的 Arguments 与下面的类似，通过使用各种对称性。

In the following, suppose we convert to decimal using any large precision p .在下文中，假设我们使用任何大精度p转换为十进制。 That is, p is the number of decimal digits in the result of the conversion.也就是说， p是转换结果中的小数位数。

Let m be the rational number 2 ⁵² +1+½−10 ^{− p} .设m为有理数 2 ⁵² +1+½−10 ^{− p} 。 The two binary64 numbers neighboring m are 2 ⁵² +1 and 2 ⁵² +2.与m相邻的两个 binary64 数是 2 ⁵² +1 和 2 ⁵² +2。 m is closer to the first one, so that is the result we require from converting m first to decimal and then to floating-point. m更接近第一个，因此这是我们需要先将m转换为十进制然后再转换为浮点的结果。

In decimal, m is 4503599627370497.4999…, where there are p −1 trailing 9s.在十进制中， m是 4503599627370497.4999…，其中有p -1 个尾随 9。 When rounded to p significant digits with ROUND_UP, ROUND_HALF_UP, or ROUND_HALF_EVEN, the result is 4503599627370497.5 = 2 ⁵² +1+½.当使用 ROUND_UP、ROUND_HALF_UP 或 ROUND_HALF_EVEN 四舍五入到p个有效数字时，结果为 4503599627370497.5 = 2 ⁵² +1+½。 (Recognize that, at the position where rounding occurs, there are 16 trailing 9s being discarded, effectively a fraction of.9999999999999999 relative to the rounding position. In ROUND_UP, any non-zero discarded amount causes rounding up. In ROUND_HALF_UP and ROUND_HALF_EVEN, a discarded amount greater than ½ at that position causes rounding up.) (Recognize that, at the position where rounding occurs, there are 16 trailing 9s being discarded, effectively a fraction of.9999999999999999 relative to the rounding position. In ROUND_UP, any non-zero discarded amount causes rounding up. In ROUND_HALF_UP and ROUND_HALF_EVEN, a position 处大于 ½ 的丢弃量会导致舍入。）

2 ⁵² +1+½ is equally close to the neighboring binary64 numbers 2 ⁵² +1 and 2 ⁵² +2, so the round-to-nearest-ties-to-even method produces 2 ⁵² +2. 2 ⁵² +1+½ 与相邻的二进制 64 数字 2 ⁵² +1 和 2 ⁵² +2 同样接近，因此四舍五入法产生 2 ⁵² +2。

Thus, the result is 2 ⁵² +2, which is not the binary64 value closest to m .因此，结果是 2 ⁵² +2，这不是最接近m的 binary64 值。

Therefore, no finite precision p suffices to round all rational numbers correctly.因此，没有有限精度p足以正确舍入所有有理数。

通过 BigDecimal 转换为浮点的适当比例

问题描述

1 个解决方案

解决方案1
2 已采纳 2019-10-11 01:20:25

Introduction介绍

Proof证明

通过 BigDecimal 转换为浮点的适当比例

问题描述

1 个解决方案

解决方案1 2 已采纳 2019-10-11 01:20:25

Introduction介绍

Proof证明

解决方案1
2 已采纳 2019-10-11 01:20:25