简体繁体 English

printf如何从浮点数中提取数字？

[英]How does printf extract digits from a floating point number?

原文 2018-06-26 22:59:00 6 3 c++/ c/ floating-point/ printf

How do functions such as printf extract digits from a floating point number? printf等printf如何从浮点数中提取数字？ I understand how this could be done in principle. 我理解原则上可以做到这一点。 Given a number x , of which you want the first n digits, scale x by a power of 10 so that x is between pow(10, n) and pow(10, n-1) . 给定数字x ，其中您想要前n数字，将x乘以10的幂，使得x在pow(10, n)和pow(10, n-1) 。 Then convert x into an integer, and take the digits of the integer. 然后将x转换为整数，并取整数的数字。

I tried this, and it worked. 我尝试了这个，它有效。 Sort of. 有点。 My answer was identical to the answer given by printf for the first 16 decimal digits, but tended to differ on the ones after that. 我的答案与printf给出的前16个十进制数字的答案相同，但之后的答案往往不同。 How does printf do it? printf如何做到这一点？

3 个解决方案

The classic implementation is David Gay's dtoa . 经典的实施是David Gay的dtoa 。 The exact details are somewhat arcane (see Why does "dtoa.c" contain so much code? ), but in general it works by doing the base conversion using more precision beyond what you can get from a 32-bit, 64-bit, or even 80-bit floating point number. 确切的细节有点神秘（请参阅为什么“dtoa.c”包含如此多的代码？），但一般来说，它的工作方式是使用比32位，64位更高的精度进行基本转换，甚至是80位浮点数。 To do this, it uses so-called "bigints" or arbitrary-precision numbers, which can hold as many digits as you can fit in memory. 为此，它使用所谓的“bigints”或任意精度数字，它可以保存尽可能多的数字，以适应内存。 Gay's code has been copied, with modifications, into countless other libraries including common implementations for the C standard library (so it might power your printf ), Java, Python, PHP, JavaScript, etc. Gay的代码已经被修改，复制到无数其他库中，包括C标准库的常见实现（因此它可能支持你的printf ），Java，Python，PHP，JavaScript等。

(As a side note... not all of these copies of Gay's dtoa code were kept up to date, so because PHP used an old version of strtod it hung when parsing 2.2250738585072011e-308.) （作为旁注...并非所有这些Gay的dtoa代码副本都保持最新，所以因为PHP在解析2.2250738585072011e-308时使用了旧版本的strtod。）

In general, if you do things the "obvious" and simple way like multiplying by a power of 10 and then converting the integer, you will lose a small amount of precision and some of the results will be inaccurate... but maybe you will get the first 14 or 15 digits correct. 一般来说，如果你做“明显”和简单的方法，比如乘以10的幂然后转换整数，你将失去一点精度，一些结果将是不准确的......但也许你会得到前14或15位正确。 Gay's implementation of dtoa() claims to get all the digits correct... but as a result, the code is quite difficult to follow. Gay的dtoa（）实现声称所有数字都正确...但结果是代码很难遵循。 Skip to the bottom to see strtod itself, you can see that it starts with a "fast path" which just uses ordinary floating-point arithmetic, but then it detects if that result is incorrect and uses a more reliable algorithm using bigints which works in all cases (but is slower). 跳到底部看strtod本身，你可以看到它以一个“快速路径”开始，它只使用普通的浮点运算，但它会检测到这个结果是否不正确并使用一个更可靠的算法使用bigint工作在所有情况（但速度较慢）。

The implementation has the following citation, which you may find interesting: 该实现具有以下引用，您可能会感兴趣：

* Inspired by "How to Print Floating-Point Numbers Accurately" by
 * Guy L. Steele, Jr. and Jon L. White [Proc. ACM SIGPLAN '90, pp. 112-126].

The algorithm works by calculating a range of decimal numbers which produce the given binary number, and by using more digits, the range gets smaller and smaller until you either have an exact result or you can correctly round to the requested number of digits. 该算法通过计算产生给定二进制数的十进制数范围来工作，并且通过使用更多数字，范围变得越来越小，直到您具有精确结果或者您可以正确舍入到所请求的数字位数。

In particular, from sec 2.2 Algorithm, 特别是从sec 2.2算法，

The algorithm uses exact rational arithmetic to perform its computations so that there is no loss of accuracy. 该算法使用精确有理算法来执行其计算，从而不会损失准确性。 In order to generate digits, the algorithm scales the number so that it is of the form 0.d ₁ d ₂ ..., where d ₁ , d ₂ , ..., are base-B digits. 为了生成数字，算法缩放数字，使其形式为0.d ₁ d ₂ ...，其中d ₁ ，d ₂ ，...是基数B数字。 The first digit is computed by multiplying the scaled number by the output base, B, and taking the integer part. 通过将缩放的数字乘以输出基数B并取整数部分来计算第一个数字。 The remainder is used to compute the rest of the digits using the same approach. 余数用于使用相同的方法计算其余数字。

The algorithm can then continue until it has the exact result (which is always possible, since floating-point numbers are base 2, and 2 is a factor of 10) or until it has as many digits as requested. 然后算法可以继续，直到它具有确切的结果（这总是可能的，因为浮点数是基数2，2是因子10）或者直到它具有所请求的数字。 The paper goes on to prove the algorithm's correctness. 本文继续证明算法的正确性。

Also note that not all implementations of printf are based on Gay's dtoa, this is just a particularly common implementation that's been copied a lot. 另请注意，并非所有printf实现都基于Gay的dtoa，这只是一个特别常见的实现，已经被复制了很多。

There are various ways to convert floating-point numbers to decimal numerals without error (either exactly or with rounding to a desired precision). 有多种方法可以将浮点数转换为十进制数而不会出错（无论是精确的还是舍入到所需的精度）。

One method is to use arithmetic as taught in elementary school. 一种方法是使用小学教授的算术。 C provides functions to work with floating-point numbers, such as frexp , which separates the fraction (also called the significand, often mistakenly called a mantissa) and the exponent. C提供了处理浮点数的函数，例如frexp ，它将分数（也称为有效数，通常被错误地称为尾数）和指数分开。 Given a floating-point number, you could create a large array to store decimal digits in and then compute the digits. 给定一个浮点数，您可以创建一个大数组来存储十进制数字，然后计算数字。 Each bit in the fraction part of a floating-point number represents some power of two, as determined by the exponent in the floating-point number. 浮点数的小数部分中的每个位表示2的幂，由浮点数中的指数确定。 So you can simply put a “1” in an array of digits and then use elementary school arithmetic to multiply or divide it the required number of times. 因此，您只需将“1”放入数字数组中，然后使用小学算术将其乘以或除以所需的次数。 You can do that for each bit and then add all the results, and the sum is the decimal numeral that equals the floating-point number. 您可以为每个位执行此操作，然后添加所有结果，总和是等于浮点数的十进制数字。

Commercial printf implementations will use more sophisticated algorithms. 商业printf实现将使用更复杂的算法。 Discussing them is beyond the scope of a Stack Overflow question-and-answer. 讨论它们超出了Stack Overflow问答的范围。 The seminal paper on this is Correctly Rounded Binary-Decimal and Decimal-Binary Conversions by David M. Gay . 关于这一点的开创性论文是David M. Gay的正确圆二进制 - 十进制和十进制 - 二进制转换 。 (A copy appears to be available here , but that seems to be hosted by a third party; I am not sure how official or durable it is. A web search may turn up other sources.) A more recent paper with an algorithm for converting a binary floating-point number to decimal with the shortest number of digits needed to uniquely distinguish the value is Printing Floating-Point Numbers: An Always Correct Method by Marc Andrysco, Ranjit Jhala, and Sorin Lerner . （这里似乎有一份副本，但似乎是由第三方主持;我不确定它是官方的还是持久的。网络搜索可能会出现其他来源。）最近一篇关于转换算法的论文二进制浮点数到十进制，用于唯一区分该值所需的最短位数是打印浮点数： Marc Andrysco，Ranjit Jhala和Sorin Lerner 总是正确的方法 。

One key to how it is done is that printf will not just use the floating-point format and its operations to do the work. 如何完成的一个关键是printf不会只使用浮点格式及其操作来完成工作。 It will use some form of extended-precision arithmetic, either by working with parts of the floating-point number in an integer format with more bits, by separating the floating-point number into pieces and using multiple floating-point numbers to work with it, or by using a floating-point format with more precision. 它将使用某种形式的扩展精度算法，通过使用更多位的整数格式处理浮点数的部分，通过将浮点数分成多个部分并使用多个浮点数来处理它，或使用更精确的浮点格式。

Note that the first step in your question, multiple x by a power of ten, already has two rounding errors. 请注意，问题的第一步，即x乘以10的倍数，已经有两个舍入误差。 First, not all powers of ten are exactly representable in binary floating-point, so just producing such a power of ten necessarily has some representation error. 首先，并非所有10的幂都可以在二进制浮点中精确表示，因此仅产生10的这种幂必然会有一些表示错误。 Then, multiplying x by another number often produces a mathematical result that is not exactly representable, so it must be rounded to the floating-point format. 然后，将x乘以另一个数字通常会产生一个不完全可表示的数学结果，因此必须舍入为浮点格式。

Neither the C or C++ standard does not dictate a certain algorithm for such things. C或C ++标准都没有规定某种算法用于此类事情。 Therefore is impossible to answer how printf does this. 因此无法回答printf如何做到这一点。

If you want to know an example of a printf implementation, you can have a look here: http://sourceware.org/git/?p=glibc.git;a=blob;f=stdio-common/vfprintf.c and here: http://sourceware.org/git/?p=glibc.git;a=blob;f=stdio-common/printf_fp.c 如果你想知道一个printf实现的例子 ，你可以看看这里： http ： //sourceware.org/git/？ printf ; a = printf和这里： http ： //sourceware.org/git/？p = glibc.git; a = blob; f = stdio-common/printf_fp.c