简体繁体中英

Does IEEE-754 float, double and quad guarantee exact representation of -2, -1, -0, 0, 1, 2?

原文 2013-11-17 10:06:22 9 3 c++/ c/ floating-point/ ieee-754/ floating-point-precision

所有在标题：不IEEE-754 float ， double和quad保证确切表示-2 ， -1 ， -0 ， 0 ， 1 ， 2 ？

3 answers

它可以保证所有整数的精确表示，直到有效的二进制位数超过尾数范围为止。

Simple way to get answer for any decimal number, convert the absolute value to binary (24 bits for float, 53 bits for double, 113 bits for quad), then back to decimal, and see if you get same value back.

For integers, answer is obvious, you don't lose anything, unless value is too big to fit into given number of bits.

Conversion of rational values with non-integer part is more interesting. There you may lose precision when converting to a binary with some fixed width, and when converting back to decimal, you may get a decimal value with periodic decimal expansion (or again lose precision if you round it).

Since you're dabbling with IEEE floats, first read the wikipedia page , then when you feel you're ready for more, proceed with the first external link there, "What Every Computer Scientist Should Know About Floating-Point Arithmetic" .

IEEE 754 floating point numbers can be used to store precisely integers of a certain ranges. For example:

binary32 , implemented in C/C++ as float , provides 24 bits of precision and therefore can represent with full precision 16-bit integers, eg short int ;
binary64 , implemented in C/C++ as double , provides 53 bits of precision and can represent exactly 32-bit integers, eg int ;
the non-standard Intel 80-bit precision, implemented as long double by some x86/x64 compilers, provides 64 significant bits and can represent 64-bit integers, eg long int (on LP64 systems, eg Unix) or long long int (on LLP64 systems, eg Windows);
binary128 , implemented as compiler-specific types such as __float128 (GCC) or _Quad (Intel C/C++), provides 113 bits in the mantissa and therefore can represent exactly 64-bit integers.

The fact that double fits an extended range of integers, even surpassing the range of 32-bit integers, is used in JavaScript, which doesen't have special integer numerical type and instead uses double precision floating-point to represent integers .

One quirk of floating-point numbers is that they have separate sign bit and therefore things like positive and negative zeros exist, which is not possible in the two's complement signed integer representation.

How to convert float to double(both stored in IEEE-754 representation) without losing precision?

To convert a float / double to its IEEE754 representation in C++

What is the fastest way to write a IEEE-754 compliant double/float division in C++?

How to output IEEE-754 format integer as a float

Is checking the location of the sign bit enough to determine endianness of IEEE-754 float with respect to integer endianness?

Converting 2 uint16_t to 32-float IEEE-754 fomat

Range of representable values of 32-bit, 64-bit and 80-bit float IEEE-754?

Double - IEEE 754 alternatives

Hexadecimal to float IEEE 754 double precision c++

significant digits with IEEE 754 float

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to convert float to double(both stored in IEEE-754 representation) without losing precision? To convert a float / double to its IEEE754 representation in C++ What is the fastest way to write a IEEE-754 compliant double/float division in C++? How to output IEEE-754 format integer as a float Is checking the location of the sign bit enough to determine endianness of IEEE-754 float with respect to integer endianness? Converting 2 uint16_t to 32-float IEEE-754 fomat Range of representable values of 32-bit, 64-bit and 80-bit float IEEE-754? Double - IEEE 754 alternatives Hexadecimal to float IEEE 754 double precision c++ significant digits with IEEE 754 float

Related Tags

Does IEEE-754 float, double and quad guarantee exact representation of -2, -1, -0, 0, 1, 2?

Question

3 answers

solution1
8 ACCPTED 2013-11-17 10:09:39

solution2
3 2013-11-17 10:29:17

solution3
3 2013-11-17 10:33:19

Does IEEE-754 float, double and quad guarantee exact representation of -2, -1, -0, 0, 1, 2?

Question

3 answers

solution1 8 ACCPTED 2013-11-17 10:09:39

solution2 3 2013-11-17 10:29:17

solution3 3 2013-11-17 10:33:19

solution1
8 ACCPTED 2013-11-17 10:09:39

solution2
3 2013-11-17 10:29:17

solution3
3 2013-11-17 10:33:19