C 和 C# 之间浮点精度行为的差异

Question

This is an academic question and so answers such as "just don't do that" miss the point.这是一个学术问题，因此诸如“只是不要那样做”之类的回答没有抓住要点。

I'm not trying to solve a problem - I'm trying to understand an observed behavior, namely a difference in how floating point math appears to function when comparing C and C#我不是要解决问题 - 我是想了解观察到的行为，即在比较 C 和 C# 时浮点数学在 function 中的表现方式的差异

Assumption: float precision in C假设：浮点精度在 C

It is my assumption that in C floats are implemented using a 23 bit mantissa and 8 bit exponent ( https://en.wikipedia.org/wiki/Single-precision_floating-point_format )我假设 C 中的floats是使用 23 位尾数和 8 位指数实现的（ https://en.wikipedia.org/wiki/Single-precision_floating-point_format ）

For a given number, we can compute the smallest precision - the smallest value you can add to the number where purely structurally it cannot be stored anymore - by computing the value of the last bit of the mantissa.对于给定的数字，我们可以通过计算尾数最后一位的值来计算最小精度 - 您可以添加到纯粹结构上不能再存储的数字的最小值。

If the floating point number is evaluated as:如果浮点数被评估为：

[sign] * 1.[mantissa] * 2^[exponent]

Then because we have 23 bits in the mantissa the value of precision is 2^(exponent-23) , where the exponent for a given number is:然后因为尾数中有 23 位，所以精度值为2^(exponent-23) ，其中给定数字的指数为：

floor(log2(number))

So the precision of a fairly large number like 10^9 is computed as follows:因此，像10^9这样的相当大的数字的精度计算如下：

exponent  = floor(log2(10^9))
          = 29

precision = 2^(exponent-23)
          = 2^(29-23)
          = 2^6
          = 64

This is the bare-metal, lowest absolutely theoretically possible value that can be added to 10^9 when stored as a float, because we're literally flipping the least significant bit of the mantissa:这是当存储为浮点数时可以添加到10^9的裸机、理论上绝对可能的最低值，因为我们实际上是在翻转尾数的最低有效位： As visualized by the IEEE-754 Floating Point Converter由 IEEE-754 浮点转换器可视化

I can also validate this with a quick C program ( run online ):我还可以使用快速 C 程序（在线运行）验证这一点：

#include <cstdio>

int main()
{  
  float number = 1e9f;          // exponent: 29, precision: 64
  printf("%'.0f\n", number);    // prints: 1000000000 
  
  number += 30;                 // 30 rounded to nearest multiple of 64 is 0 
  printf("%'.0f\n", number);    // prints: 1000000000 
  
  number += 40;                 // 40 rounded to nearest multiple of 64 is 64
  printf("%0'.0f\n", number);   // prints: 1000000064 
  
  return 0;
}

It is my assumption that the general 32 bit floating point format (1 bit sign, 8 bit exponent, 23 bit mantissa) is so universal that it's something intrinsic to modern CPUs, and so generally behavior would be the same across programming languages.我假设通用的 32 位浮点格式（1 位符号，8 位指数，23 位尾数）是如此普遍以至于它是现代 CPU 固有的东西，因此在编程语言中通常行为是相同的。

Question: float precision in C#问题：C# 中的浮点精度

So with that stated, when I try the same validation test in C# the value of the number does not change.因此，如上所述，当我在 C# 中尝试相同的验证测试时，数字的值不会改变。

If I use a smaller value 10^8 , which would have an exponent of 26 and therefore a precision of 2^(26-23) = 8 given my above assumptions of how the bits of the floating point format represent the number internally, I notice the following behavior:如果我使用较小的值10^8 ，它的指数为26 ，因此精度为2^(26-23) = 8鉴于我上面关于浮点格式的位如何在内部表示数字的假设，我请注意以下行为：

float number = 1e8f;                 // exponent: 26, precision: 8
Console.WriteLine($"{number,1:0}");  // prints: 100000000 

number += 30;                        // 30 rounded to multiple of 8 -should- be 32
Console.WriteLine($"{number,1:0}");  // prints: 100000000 

number += 40;                        // 40 rounded to multiple of 8 -should- be 40
Console.WriteLine($"{number,1:0}");  // prints: 100000100

And that... confuses me somewhat.那……让我有些困惑。 Where did that 100 come from?那100是从哪里来的？ That's not even a multiple of 2!这甚至不是 2 的倍数！

with a value of 1e8f C also behaves as expected and supports the precision being a value of '8': cpp.sh/6qesv值为 1e8f C 的行为也符合预期并支持精度值为“8”：cpp.sh/6qesv

Looking at the C# documentation for floating point values nothing jumps out at me that would suggest that C# should handle float addition any differently here than C, and what I would expect given how floating point values are implemented.查看有关浮点值的 C# 文档，我没有想到 C# 在这里处理浮点数加法的方式应与 C 不同，而考虑到浮点值的实现方式，我的期望是什么。

The docs do mention that the approximate precision of floats is ~6-9 digits which is frustratingly vague.文档确实提到了浮点数的近似精度是 ~6-9 位数字，这是令人沮丧的模糊。 I suppose that could be an answer: "you're dealing with digits past the guaranteed limit, it's undefined behavior" and while true, that is unsatisfying.我想这可能是一个答案：“你正在处理超过保证限制的数字，这是未定义的行为”，虽然是真的，但这并不令人满意。

I would like to know, ideally broken down step by step, what actually happened in C#'s implementation there that makes it behave so differently than C here.我想知道，理想情况下逐步分解，在 C# 的实现中实际发生了什么，使得它的行为与这里的 C 如此不同。

Answer 1

Promoting my comment to an answer:将我的评论提升为答案：

The problem here isn't floating point, it's differences in string formatting.这里的问题不是浮点数，而是字符串格式的差异。 I'm not familiar with what, exactly, a format specified of "0" means or does (and can't seem to find it documented anywhere), but it's responsible for the unusual rounding you're seeing.我不熟悉指定为“0”的格式的确切含义或作用（并且似乎无法在任何地方找到它的记录），但它是您看到的异常舍入的原因。

Using the format specifier of "G9" is recommended for formatting a single precision float in such a way that it will round-trip correctly (meaning parsing the string back into a single precision float will reproduce the original value exactly). 建议使用格式说明符“G9”来格式化单精度浮点数，使其能够正确往返（这意味着将字符串解析回单精度浮点数将准确地重现原始值）。 If you change your code to use {number:G9} in the interpolated strings you should see the expected result.如果您将代码更改为在内插字符串中使用{number:G9} ，您应该会看到预期的结果。

C 和 C# 之间浮点精度行为的差异

问题描述

Assumption: float precision in C假设：浮点精度在 C

Question: float precision in C#问题：C# 中的浮点精度

1 个解决方案

解决方案1
1 2020-09-01 19:02:59

C 和 C# 之间浮点精度行为的差异

问题描述

Assumption: float precision in C假设：浮点精度在 C

Question: float precision in C#问题：C# 中的浮点精度

1 个解决方案

解决方案1 1 2020-09-01 19:02:59

解决方案1
1 2020-09-01 19:02:59