简体   繁体   中英

Value difference when converting from double(float64) to float(float32) in numpy

When I run the simple code:

a = np.float64([20269207])
b = np.float32(a)

The output turns to be

a = array([20269207.])
b = array([20269208.], dtype=float32)

What reason causes the difference before and after this conversion? And in what condition the outputs will be different?

It is impossible to store the value 20269207 in the float32 ( IEEE 754 ) format.

You may see, why:

It is possible to store the values 20269206 and 20269208 ; their representations in binary form are (see IEEE-754 Floating Point Converter ):

  • 01001011100110101010010001001011 for 20269206
  • 01001011100110101010010001001100 for 20269208

Their binary forms differ by 1 , so there is no place for any number between 20269206 and 20269208 .

By the rounding rules “Round to nearest, ties to even” and “Round to nearest, ties away from zero” of IEEE 754 , your number is rounded to the nearest even higher number, ie to the number 20269208 .


Outputs for integer numbers will be different:

  • for odd numbers with absolute value greater than 16,777,216 ,
  • for almost all numbers with absolute value greater than 33,554,432 .

Notes:

  1. The first number is 2^24 , the second one is 2^25 .
  2. "allmost all" - there are "nice" numbers, such as powers of 2, which have precise representations even for very very large numbers .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM