简体繁体 English

IEEE 754和机器编号

[英]IEEE 754 and machine numbers

原文 2018-12-08 20:06:11 9 1 floating-point/ binary/ rounding/ ieee-754

I've been trying to wrap my head around machine numbers like the unit roundoff (u) and epsilon (e) in combination with the IEEE 754 standard. 我一直在尝试将机器编号（例如单位舍入（u）和epsilon（e））与IEEE 754标准结合使用。 My textbook states some things that don't really make sense to me. 我的教科书指出了一些对我来说真的没有意义的事情。

Unit roundoff according to my textbook is: 根据我的教科书，单位四舍五入是：

for single precision (mantissa is 23 bit): u = 6e-8 对于单精度（尾数为23位）：u = 6e-8
for double precision (mantissa is 52 bit): u = 2e-16 对于双精度（尾数为52位）：u = 2e-16

I've been trying to derive a formula for these results with two relations: 我一直在尝试为具有两个关系的这些结果得出公式：

my textbook states: "In binary arithmetic with rounding we usually have e = 2*u" 我的教科书指出：“在四舍五入的二进制算术中，我们通常有e = 2 * u”
- e = 2^-n, n being the amount of mantissa bits e = 2 ^ -n，n是尾数位的数量

These combined results would then give: u = 2^-(n+1), again with n being the amount of mantissa bits. 这些组合的结果将得出：u = 2 ^-（n + 1），同样，n是尾数位的数量。 Checking this formule with the given results of u for different precisions: 使用u的给定结果检查此公式的不同精度：

for single: u = 2^-(23+1) = 5.96e-8, this result checks out. 对于单身：u = 2 ^-（23 + 1）= 5.96e-8，此结果将检出。 for double: u = 2^-(52+1) = 1.11e-16, this result doesn't check out. 对于double：u = 2 ^-（52 + 1）= 1.11e-16，此结果未检出。

Could someone please help me derive a correct formule for the unit roundoff, or point me to some mistakes I have been making? 有人可以帮助我为单位舍入得出正确的公式，还是向我指出我一直在犯的一些错误？ All help is appreciated. 感谢所有帮助。

1 个解决方案

This appears to be an error in your textbook. 这似乎是教科书中的错误。

The significands of the IEEE-754 basic 32- and 64-bit binary floating-point formats are 24 and 53 bits, respectively. IEEE-754基本32位和64位二进制浮点格式的有效位数分别为24位和53位。 ¹ It is sometimes stated the significands are 23 bits and 52 bits, but this is a mistake. ¹有时会指出有效位数是23位和52位，但这是一个错误。 Those are the sizes of the main fields for encoding the significands, but the full 24-bit significand is encoded with 23 bits in the main significand field and 1 bit in the exponent field. 这些是用于编码有效位数的主要字段的大小，但是完整的24位有效位数在主要有效字段中编码为23位，在指数字段中编码为1位。 Similarly, the full 53-bit significand is encoded with 52 bits in the main significand field and 1 bit in the exponent field. 同样，完整的53位有效数字在主有效字段中编码为52位，在指数字段中编码为1位。 (The leading bit of the full significand comes from the exponent field: If the exponent field is zero, the leading significand bit is 0. If the exponent field is neither zero nor all ones, the leading significand bit is 1. If the exponent field is all ones, the floating-point object is a special value, either an infinity or a NaN.) （全有效位的前导位来自指数字段：如果指数字段为零，则前导有效位为0。如果指数字段既不是零也不是全1，则前导有效位是1。如果指数字段全部为1，则浮点对象是一个特殊值，可以是无穷大或NaN。）

When the leading bit of the 24-bit significand represents the value 1, the least significant bit represents the value 2 ⁻²³ . 当24位有效数字的前导位表示值1时，最低有效位表示值2 ^-23 。 That is the so-called epsilon. 那就是所谓的ε。 When a real number is being rounded to the nearest representable floating-point value, the maximum error is half the value of the least significant bit. 当将实数舍入为最接近的可表示浮点值时，最大误差为最低有效位的值的一半。 (Because, if it were more than half the distance between two numbers, we would choose the number in the other direction, since it is closer.) （因为如果距离大于两个数字之间的距离的一半，我们会选择另一个方向的数字，因为它更接近。）

For a 53-bit significand, the least significant bit represents the value 2 ⁻⁵² relative to the leading bit, and the maximum error when rounding to nearest is half that. 对于53位有效数字，最低有效位表示相对于前导位的值2 ^-52 ，而四舍五入到最接近值时的最大误差是该值的一半。 So, for a leading bit of 1, the maximum rounding error should be 2 ⁻⁵³ , which is about 1.11•10 ⁻¹⁶ . 因此，对于前导位1，最大舍入误差应为2 ^-53 ，约为1.11•10 ^-16 。 If your book says it is 2 ⁻¹⁶ , it is incorrect. 如果您的书说是2 ^-16 ，那是不正确的。

Footnote 脚注

¹ “Significand” is the preferred term. ¹ “有效”是首选术语。 “Mantissa” is an old term for the fraction portion of a logarithm. “ Mantissa”是对数的分数部分的旧术语。 Significands are linear. 有效位数是线性的。 Mantissas are logarithmic. 尾数是对数的。