Value difference when converting from double(float64) to float(float32) in numpy

Question

When I run the simple code:

a = np.float64([20269207])
b = np.float32(a)

The output turns to be

a = array([20269207.])
b = array([20269208.], dtype=float32)

What reason causes the difference before and after this conversion? And in what condition the outputs will be different?

Answer 1

It is impossible to store the value 20269207 in the float32 ( IEEE 754 ) format.

You may see, why:

It is possible to store the values 20269206 and 20269208 ; their representations in binary form are (see IEEE-754 Floating Point Converter ):

01001011100110101010010001001011 for 20269206
01001011100110101010010001001100 for 20269208

Their binary forms differ by 1 , so there is no place for any number between 20269206 and 20269208 .

By the rounding rules “Round to nearest, ties to even” and “Round to nearest, ties away from zero” of IEEE 754 , your number is rounded to the nearest even higher number, ie to the number 20269208 .

Outputs for integer numbers will be different:

for odd numbers with absolute value greater than 16,777,216 ,
for almost all numbers with absolute value greater than 33,554,432 .

Notes:

The first number is 2^24 , the second one is 2^25 .
"allmost all" - there are "nice" numbers, such as powers of 2, which have precise representations even for very very large numbers .

Value difference when converting from double(float64) to float(float32) in numpy

Question

1 answers

solution1
0 2020-04-30 04:45:51

Value difference when converting from double(float64) to float(float32) in numpy

Question

1 answers

solution1 0 2020-04-30 04:45:51

solution1
0 2020-04-30 04:45:51