简体   繁体   中英

How to calculate number of mantissa bits?

I want to calculate number of mantissa bits in float and double. I know those numbers should be 23 and 52, but I have to calculate it in my program.

There are constants you can use defined in the header <cfloat>

See FLT_MANT_DIG for example.

There is an ambiguity in number of mantissa bits : it could be

  • the number of bits needed to represent the floating point value.
  • the number of bits stored into the floating point representation.

Typically, the mantissa as stored in the IEEE floating point format does not include the initial 1 that is implied for all regular non zero numbers. Therefore the number of bits in the representation is one less that the true number of bits.

You can compute this number for the binary floating point formats in different ways:

  • some systems define manifest contants FLT_MANT_BITS , DBL_MANT_BITS and LDBL_MANT_BITS . The value is the true number of mantissa bits.
  • you can derive the number of bits from a direct computation involving FLT_EPSILON defined in <float.h> : FLT_EPSILON is the smallest float value such that 1.0f + FLT_EPSILON is different from 1.0f . The true number of mantissa is 1 - log(FLT_EPSILON) / log(2) . The same formula can be used for other floating point formats.
  • you can compute the values with a loop, as illustrated in the code below.

Here is a test utility:

#include <float.h>
#include <math.h>
#include <stdio.h>

int main(void) {
    int n;
    float f = 1.0;
    for (n = 0; 1.0f + f != 1.0f; n++) {
        f /= 2;
    }
#ifdef FLT_MANT_BITS
    printf("#define FLT_MANT_BITS       %d\n", FLT_MANT_BITS);
#endif
#ifdef FLT_EPSILON
    printf("1 - log(FLT_EPSILON)/log(2) =  %g\n", 1 - log(FLT_EPSILON) / log(2));
#endif
    printf("Mantissa bits for float: %d\n", n);
    double d = 1.0;
    for (n = 0; 1.0 + d != 1.0; n++) {
        d /= 2;
    }
#ifdef DBL_MANT_BITS
    printf("#define DBL_MANT_BITS       %d\n", DBL_MANT_BITS);
#endif
#ifdef DBL_EPSILON
    printf("1 - log(DBL_EPSILON)/log(2) =  %g\n", 1 - log(DBL_EPSILON) / log(2));
#endif
    printf("Mantissa bits for double: %d\n", n);
    long double ld = 1.0;
    for (n = 0; 1.0 + ld != 1.0; n++) {
        ld /= 2;
    }
#ifdef LDBL_MANT_BITS
    printf("#define LDBL_MANT_BITS      %d\n", LDBL_MANT_BITS);
#endif
#ifdef LDBL_EPSILON
    printf("1 - log(LDBL_EPSILON)/log(2) = %g\n", 1 - log(LDBL_EPSILON) / log(2));
#endif
    printf("Mantissa bits for long double: %d\n", n);
    return 0;
}

Output on my laptop:

1 - log(FLT_EPSILON)/log(2) =  24
Mantissa bits for float:       24
1 - log(DBL_EPSILON)/log(2) =  53
Mantissa bits for double:      53
1 - log(LDBL_EPSILON)/log(2) = 64
Mantissa bits for long double: 64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM