简体   繁体   English

C - 浮点舍入

[英]C - floating point rounding

I'm trying to understand how floating point numbers work. 我试图理解浮点数是如何工作的。

I think I'd like to test out what I know / need to learn by evaluating the following: I would like to find the smallest x such that x + 1 = x , where x is a floating point number. 我想通过评估以下内容来测试我知道/需要学习的内容:我想找到最小的x ,使得x + 1 = x ,其中x是浮点数。

As I understand it, this would happen in the case where x is large enough so that x + 1 is closer to x than the next number higher than x representable by floating point. 据我所知,这种情况会发生在x足够大的情况下,使得x + 1比x更接近x,而下一个数字高于浮点所代表的x。 So intuitively it seems it would be the case where I don't have enough digits in the significand. 如此直观地看起来就是我在有效数字中没有足够数字的情况。 Would this number x then be the number where the significand is all 1's. 这个数字x是否是有效数字为1的数字。 But then I can't seem to figure out what the exponent would have to be. 但后来我似乎无法弄清楚指数必须是什么。 Obviously it would have to be big (relative to 10^0, anyway). 显然它必须很大(相对于10 ^ 0,无论如何)。

You just need an expression for the value of the LS bit in the mantissa in terms of the exponent. 你只需要用指数表示尾数中LS位的值。 When this is > 1 then you have met your condition. 当这个> 1时,你就达到了你的条件。 For a single precision float the LS bit has a value of 2^-24*2^exp, so the condition would me met when exp is > 24, ie 25. The smallest (normalized) number where this condition would be satisfied would therefore be 1.0 * 2^25 = 33554432.0f. 对于单精度浮点数,LS位的值为2 ^ -24 * 2 ^ exp,因此当exp> 24时我将满足条件,即25.满足此条件的最小(标准化)数将因此是1.0 * 2 ^ 25 = 33554432.0f。

I haven't checked this, so my maths may be off somewhere (eg by a factor of 2) and it's also possible that the FP unit does rounding beyond the 24th bit, so there may be a further factor of 2 needed to account for this, but you get the general idea... 我没有检查过这个,所以我的数学可能会偏离某个地方(例如2倍),并且FP单元也可能会超出第24位,因此可能还需要2倍的因子来计算这个,但你得到了一般的想法......

Start with 1.0, and keep doubling it until the test succeeds: 从1.0开始,并保持加倍,直到测试成功:

double x;
for (x = 1.0; x + 1 != x; x *= 2) { }
printf("%g + 1 = %g\n", x, x + 1);

I suggest that while trying to understand fp numbers and fp arithmetic you work in decimal with 5 digits in the significand and 2 in the exponent. 我建议在尝试理解fp数字和fp算术时,你在十进制中工作,有效数字为5位,指数为2。 (Or, if 5 and 2 don't suit you, 6 and 3 or any other small numbers you like.) The issues of: (或者,如果5和2不适合你,6和3或你喜欢的任何其他小数字。)问题:

  • the limited set of numbers which can be represented; 可以表示的有限数字集;
  • non-commutativity, non-associativity and non-distributivity; 非交换性,非关联性和非分配性;
  • the problems which can arise when treating fp numbers as real numbers; 将fp数字视为实数时可能出现的问题;

are all much easier to figure out in decimal and the lessons you learn are entirely general. 更容易理解十进制,你学到的课程是完全一般的。 Once you've got this figured out, enhancing your knowledge with IEEE fp arithmetic will be relatively straightforward. 一旦你弄明白这一点,用IEEE fp算法增强你的知识将是相对简单的。 You'll also be able to figure out other fp arithmetic systems with relative ease. 您还可以相对轻松地找出其他fp算术系统。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM