简体   繁体   English

浮点运算=从Dec到Binary的最差精度/差异是多少?

[英]Floating-Point Arithmetic = What is the worst precision/difference from Dec to Binary?

as all know decimal fractions (like 0.1) , when stored as floating point (like double or float) will be internally represented in "binary format" (IEEE 754). 因为所有已知的decimal fractions (如0.1),当存储为floating point (如double或float)时将在内部以“二进制格式”(IEEE 754)表示。 And some decimal fractions can not directly be represented in binary format. 并且一些小数部分不能直接以二进制格式表示。

What I have not understood, is the precision of this "conversion": 我还没有理解的是这种“转换”的精确度:

1.) A Floating point itself can have a precision (that is the "significant")? 1.)浮点本身可以​​具有精度(即“重要”)?

2.) But also the conversion from decimal fraction to binary fraction has a precision loss? 2.)但是从小数部分到二进制部分的转换也有精度损失?

Question: 题:

What is the worst case precision loss (for "all" possible decimal fractions) when converting from decimal fractions to floating point fractions? 从十进制分数转换为浮点分数时,最坏情况精度损失(“所有”可能的小数部分)是多少?

(The reason I want to know this is, when comparing decimal fractions with binary/floating point fractions I need to take the precision into account...to determine if both figures are identical. And I want this precision to be as tight/precise as possible (decimal fraction == binary fraction +/- precision) (我想知道这个的原因是,当比较十进制分数与二进制/浮点分数时,我需要考虑精度...以确定两个数字是否相同。我希望这个精度是紧密/精确的尽可能(decimal fraction == binary fraction +/- precision)

Example (only hypothetical) 示例(仅假设)

0,1 dec => 0,10000001212121212121212 (binary fraction double) => precision loss 0,00000001212121212121212
0,3 dec => 0,300000282828282 (binary fraction double) => precision loss  0,000000282828282

It is not entirely clear to me what you are after, but you may be interested in the following paper which discusses many of the accuracy issues involved in binary/decimal conversion, including lists of hard cases. 我不完全清楚你所追求的是什么,但你可能对以下论文感兴趣,后面的文章讨论了二进制/十进制转换中涉及的许多准确性问题,包括硬案例列表。

Vern Paxson and William Kahan. Vern Paxson和William Kahan。 A program for testing IEEE decimal-binary conversion. 用于测试IEEE十进制二进制转换的程序。 May 22, 1991 http://www.icir.org/vern/papers/testbase-report.pdf 1991年5月22日http://www.icir.org/vern/papers/testbase-report.pdf

Floating point will become more and more inaccurate the larger it gets (both in the positive and negative directions). 浮点数越大越不准确(正方向和负方向)。 This is because floating point values are an exponential format. 这是因为浮点值是指数格式。

However, decimal will become more and more exact the more decimal places it uses, regardless of how large it is. 但是,十进制将越来越精确地表示它使用的小数位数越多,无论它有多大。

Therefore, the worst precision difference would be towards the numerical limits of whatever floating point type you're using. 因此,最差的精度差异将是您正在使用的任何浮点类型的数值限制。

Due to the way we're taught to count when children, it is difficult to fully appreciate the precision characteristics of binary fractions. 由于我们被教导计算孩子时的方式,很难完全理解二元分数的精确特性。 The problem is that a fraction can only be in terms of the power of the counting system. 问题是分数只能是计数系统的功率。 It seems so obvious to say, but the basic problem is that decimal divides things into tens whilst binary divides things into twos (halves). 这似乎很明显,但基本问题是十进制将事物划分为数十,而二进制将事物划分为二十(一半)。

Most of the time, there are two times you want a floating-point value in computing: when it is a currency value and when it is not. 大多数情况下,您有两次想要计算浮点值:当它是货币值时,何时不是。 The latter could range from an input from an encoder on a spinning shaft to a position in a virtual space for handing to a graphics engine. 后者的范围可以从旋转轴上的编码器的输入到用于处理图形引擎的虚拟空间中的位置。 There is no problem with the fractional value being in binary because it truly is a fractional value. 小数值在二进制中没有问题,因为它确实是一个小数值。 This is partly why FPUs bacame popular for 3D graphics years ago. 这就是为什么FPU几年前因3D图形而流行的原因。

The problem comes with representing currency where the fractional part is actually discrete decimal units. 问题在于代表货币,其中小数部分实际上是离散的十进制单位。 You can have 0.01 of a dollar (depending on which dollar it is!) in the real world, but this is difficult to accurately represent in binary. 在现实世界中,你可以拥有0.01美元(取决于它是多少美元!),但这很难准确地用二进制表示。 This is why you should never use binary floating point for currency. 这就是为什么你永远不应该使用二进制浮点货币。

If you are converting between decimal and binary floating point and trying to make comparisons, I'd be looking at why you're doing conversions and what the comparisons are supposed to achieve. 如果你在十进制和二进制浮点之间进行转换并尝试进行比较,我会看看为什么要进行转换以及比较应该达到什么目的。

Provided that the decimal value falls into the range of representable floating-point values, and your language/implementation has correctly-rounded conversions (many do, some don't), the error from such a conversion is bounded by 1/2 of the distance between consecutive floating-point numbers, or "ulp" (Unit in the Last Place). 如果十进制值落在可表示浮点值的范围内,并且您的语言/实现具有正确舍入的转换(许多是,有些则没有),则此类转换的错误受到1/2的限制。连续浮点数之间的距离,或“ulp”(最后位置的单位)。

The relative size of an ulp is biggest between an exact power of two and the next larger number, so the largest relative error of conversion between decimal and double is achieved when the input is just barely smaller than 1 + 1/2 ulp, or that value scaled by a power of two. ulp的相对大小在精确的2的幂和下一个更大的数之间是最大的,因此当输入刚好小于1 + 1/2 ulp时,实现了decimal和double之间转换的最大相对误差,或者价值按2的幂来衡量。 An example of such a value is: 这种值的一个例子是:

1.0000000000000001110223024625156540423631668090820312

(That's almost infinitesimally smaller than 1 + 2^-53). (这几乎无条件地小于1 + 2 ^ -53)。

Since the error from conversion has a relative bound, the absolute error gets bigger as we scale this value up by powers of two, obviously. 由于转换中的误差具有相对界限,因此绝对误差会随着我们将此值向上扩展2的幂而变大。

Of course, if a number falls outside of the range of representable values (either by being too big or too small), then all precision is lost. 当然,如果数字超出可表示值的范围(通过太大或太小),则所有精度都会丢失。 Converting, say 1e400 to double yields infinity ; 1e400转换为double收益infinity ; no trace of our actual input remains. 没有我们实际输入的痕迹。 Similarly, converting 1e-400 to double produces zero. 类似地,将1e-400转换为double会产生零。

The bigger the number gets, the higher the precision loss can be (but it might be precisely your number, which you specify). 数字越大,精度损失就越高(但可能正是您指定的数字)。

You don't only store very small numbers in java as float or double, but very big numbers too like 9*10^105. 你不仅可以将非常小的数字存储在java中作为float或double,而且非常大的数字也会像9 * 10 ^ 105一样存储。

And I want this precision to be as tight/precise as possible 我希望这种精度尽可能紧凑/精确

You may choose BigDecimal, where you can specify, how precise you like to get, but of course you're somehow limited by RAM, by CPU-time, by the limits of the JVM. 您可以选择BigDecimal,您可以在其中指定,您希望获得多少精确度,但当然您会受到RAM,CPU时间以及JVM限制的限制。

Are you only interested in absolute precision, or in relative precision? 您是否只对绝对精度或相对精度感兴趣?

compare the difference in the precision of: 比较精度的差异:

a = 100000000000000,0000000000000001 
b = 100000000000000,0000000000000002

layoutHonkyTonkA= 0,0000000000000001 
layoutHonkyTonkB= 0,0000000000000002

The absolute precision difference is the same, but the relative precision difference is very different. 绝对精度差异相同,但相对精度差异非常大。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM