简体   繁体   中英

How do you convert from binary to IEEE754 when the last decimal isn't 5?

Everything I can find on this says to simply multiply by 2 until the decimal resolves to zero, but this only works if the last decimal is 5.

In my particular case the number to convert is 98765.4321, how would I convert this (or any other decimal that doesn't resolve) to IEEE754?

I will assume the question is as follows: We are given a number specified as a sequence of decimal digits that possibly includes a decimal fraction, and possibly makes use of scientific notation. How do we correctly convert this number into one of the binary floating-point formats specified by the IEEE 754 floating-point standard, ie binary16 (half precision), binary32 (single precision), binary64 (double precision), or binary128 (quadruple precision)?

As you noted, most decimal numbers cannot be represented exactly in a binary floating-point format. That means we need to chose one of the IEEE-754 rounding mode that should be used to determine the final result: Round towards positive infinity ("up"), round towards negative infinity ("down"), round towards zero (truncate), or round towards-nearest-or-even ("nearest"). Decimal-to-binary conversion typically uses the last mode listed, round towards-nearest-or-even, as this minimizes overall error in the conversion.

Conceptually, our task is simple. Carry out the conversion process until we have generated enough bits to make a correct rounding decision. Clearly, we will often need more bits than provided by the target format. However, we cannot tell a-priory exactly how many bits we will need, as some hard to round cases will generate results very close to a tie-case. The take-home message is that some parts of our algorithm will require the use of some sort of extended precision (or multi-precision) arithmetic, and we need to develop a criterion for determining when we have generated enough bits for correct rounding.

The fundamental algorithms for correct conversions were developed over a couple of decades in the past century, and are described in the following publications:

David W. Matula, "In-and-out conversions". Communications of the ACM , Vol. 11, No. 1 (Jan. 1968), pp. 47-50

David W. Matula, "A Formalization of Floating-Point Numeric Base Conversion". IEEE Transactions on Computers , Vol 10, No. 8 (Aug. 1970), pp. 681-692 ( online )

William D. Clinger, "How to Read Floating Point Numbers Accurately". SIGPLAN Notices , Vol. 25, No. 6 (June 1990), pp. 92-101 ( online )

David M. Gay, "Correctly rounded binary-decimal and decimal-binary conversions". Technical Report 90--10, AT&T Bell Laboraties, November 1990. ( online )

A fresh look at this research area is provided by the following publications:

Michel Hack, "On Intermediate Precision Required for Correctly-Rounding Decimal-to-Binary Floating-Point Conversion." In Proceedings of Real Numbers and Computers (RNC'6) , Nov. 2004, pp. 113-133 ( online )

Aubrey Jaffer, "Easy Accurate Reading and Writing of Floating-Point Numbers". arXiv:1310.8121, draft v6 (Jan. 2015), ( online )

Although the fundamental algorithms have been around for twenty-five years, they are of considerable complexity, and the "devil is in the details". Correct implementations of decimal-to-brinary conversions continue to prove elusive. Over the past 5 years, Rick Regan's blog "Exploring Binary" has chronicled a number of defects in the decimal-to-binary conversion functionality of widely used software, such as Microsoft Visual C/C++ , glibc , and PHP , where the last item would cause an infinite loop that might be exploited for denial-of-service attacks.

A paper by Vern Paxson and William Kahan addresses the issue of hard-to-round cases in decimal-to-binary conversion, and gives some examples that demonstrate how many additional bits beyond target precision may be required for correct rounding:

V. Paxson and W. Kahan, "A Program for Testing IEEE Decimal–Binary Conversion". Manuscript, May 1991 ( online )

Additional hard-to-round cases for IEEE-754 binary64 were listed in a 1996 posting to the newsgroup comp.arch.arithmetic by Fred Tydeman.

The following paper describes a test framework for testing conversions, however the files containing the test vectors were no longer accessible online the last time I checked:

Brigitte Verdonk, Annie Cuyt, and Dennis Verschaeren. "A precision-and range-independent tool for testing floating-point arithmetic II: conversions." ACM Transactions on Mathematical Software , Vol. 27, No. 1 (Mar. 2001), pp. 119-140. ( draft online )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM