简体   繁体   中英

Hex float representation in c

When I am reading hex notation of floats in C then I encounter a special number "0xa.1fp10" from Stephen Prata book. When I assigned this number to a float or double variable and printed it using "%a" format specifier in printf , then the result was 0x1.43e000p+13 which do not match with the original. But both are same value 10364 in decimal. What is going on? Why has output value changed? How can I get the original number as output?

Unfortunately you cannot portably get the same format 0xa.1fp10 out of printf . The C standard specifies that the output of %a is such that for a normal double that is non-zero, there be one non-zero digit before the . and as many digits as needed to represent the value exactly after the . . The implementation can choose how many of the first bits goes into the first digit!

However C11 standard has footnote 278 that says that

Binary implementations can choose the hexadecimal digit to the left of the decimal-point character so that subsequent digits align to nibble (4-bit) boundaries.

And here's the problem. Since IEEE 754 double s have 53-bit mantissas; the first bit being 1 for normal numbers; the rest of the 52 bits are equally divisible by 4, an implementation following that footnote (Glibc on my machine seems to be one), will always output any finite non-zero floating point number so that it starts with 0x1. !

Try for example this minimal program:

#include <stdio.h>

int main(void) {
    for (double i = 1; i < 1024 * 1024; i *= 2) {
        printf("%a %a %a\n", 1.0 * i, 0.7 * i, 0.67 * i);
    }
}

The output of which on my computer is

0x1p+0 0x1.6666666666666p-1 0x1.570a3d70a3d71p-1
0x1p+1 0x1.6666666666666p+0 0x1.570a3d70a3d71p+0
0x1p+2 0x1.6666666666666p+1 0x1.570a3d70a3d71p+1
0x1p+3 0x1.6666666666666p+2 0x1.570a3d70a3d71p+2
0x1p+4 0x1.6666666666666p+3 0x1.570a3d70a3d71p+3
0x1p+5 0x1.6666666666666p+4 0x1.570a3d70a3d71p+4
0x1p+6 0x1.6666666666666p+5 0x1.570a3d70a3d71p+5
0x1p+7 0x1.6666666666666p+6 0x1.570a3d70a3d71p+6
0x1p+8 0x1.6666666666666p+7 0x1.570a3d70a3d71p+7
0x1p+9 0x1.6666666666666p+8 0x1.570a3d70a3d71p+8
0x1p+10 0x1.6666666666666p+9 0x1.570a3d70a3d71p+9
0x1p+11 0x1.6666666666666p+10 0x1.570a3d70a3d71p+10
0x1p+12 0x1.6666666666666p+11 0x1.570a3d70a3d71p+11
0x1p+13 0x1.6666666666666p+12 0x1.570a3d70a3d71p+12
0x1p+14 0x1.6666666666666p+13 0x1.570a3d70a3d71p+13
0x1p+15 0x1.6666666666666p+14 0x1.570a3d70a3d71p+14
0x1p+16 0x1.6666666666666p+15 0x1.570a3d70a3d71p+15
0x1p+17 0x1.6666666666666p+16 0x1.570a3d70a3d71p+16
0x1p+18 0x1.6666666666666p+17 0x1.570a3d70a3d71p+17
0x1p+19 0x1.6666666666666p+18 0x1.570a3d70a3d71p+18

This output is efficient - for each normal number the code needs to output just 0x1. followed by all of the actual nibbles of the mantissa converted to hex, strip trailing 0 characters and append p+ followed by exponent.


For long doubles, the x86 format has 64 bit mantissa. Since 64 bits is exactly divisible into nibbles, a sensible implementation will have a full nibble preceding the . for normal numbers, with values varying from 0x8 to 0xF (the first bit is always 1), and up to 15 nibbles following the point.

Try your implementation with

#include <stdio.h>
int main(void) {
    for (long double i = 1; i < 32; i ++) {
        printf("%La\n", i);
    }
}

to see if it conforms to this expectation...


Between positive normal numbers and zero there can be subnormal numbers - my Glibc represents these double values with 0x0. followed by the actual nibbles of the mantissa, with trailing zeroes removed, and the fixed exponent -1022 - again, the representation being that which is easiest to implement and fastest to compute.

This is a hexadecimal floating-point format. The digits (and period) after 0x and before p are a hexadecimal numeral. That part is called the significand. The digits after p are a decimal numeral indicates the power of 2 by which to multiply the significand.

In 0xa.1fp10 , the significand is a.1f . This represents the number 10•16 0 + 1•16 −1 + 15•16 −2 , which equals 10 + 31/256, or 2591/256.

Then p10 says to multiply this by 2 1024 , so the result is 2591/256 • 1024 = 10,364.

The result is only a number. 0xa.1fp10 , 10364 , and 0x1.43ep13 are three different numerals that represent the same number. When you store this value in a float or double , the object contains only the number. There is no record of its original format. When you print it with %a , the implementation chooses the leading digit 1 . Because there is no record of the original numeral, there is no way to cause printf to produce the original string, unless you have some separate record of this information and write your own software to print the numeral.

Floating-point formats often use a binary base, and it is difficult to write good software that correctly converts decimal scientific notation to binary floating-point. (It is a solved problem with published papers, but good software has not always been used.) Using a hexadecimal format instead of decimal makes it easy to exactly specify the value the author wants in the floating-point number and easy for the compiler to interpret it. The hexadecimal format is designed for this purpose: Ease and exactness of reading and writing floating-point numbers. It is not designed to facilitate aesthetic concerns such as reproducing a particular scaling or normalization.

Footnote

1 When %a is used, the C standard leaves it up to the implementation to choose the scaling used except that there is exactly one digit before the decimal-point character, it is non-zero if the number is in the normal range of the floating-point format, and the number of digits after the point equals the precision.

But both are same value 10364 in decimal.

Indeed.

What is going on? Why has output value changed?

Why shouldn't it change? The representation of a double in memory does not carry any formatting information. And as you yourself observed, the output represents the same number that the input did, so the value did not change. It's just represented differently.

Roughly analogous behavior could be made to happen with decimal numbers, too, using %e directives.

How can I get the original number as output?

Chances are good that you cannot get your particular printf() implementation to emit the particular representation the program read from its input. However, if there is something systematic about that representation, such as having the least exponent that affords a single hex digit before the radix point, then you could, in principle, write your own output function that produces that representation.

In comments you add,

But what is the standard representation?

There isn't one in the sense of a representation demanded by the C language standard. The language requires only that the representation have exactly one hex digit before the radix point, and that it be nonzero if the number is normalized and itself nonzero. That leaves four possibilities for most normalized floating-point numbers.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM