简体   繁体   中英

convert unsigned integer to float in python

I wrote a socket server that reads data from some devices. After reading data binary shift is applied on bytes. After that i get an integer value for instance 1108304047 and i want to convert this number to IEEE 754 float 35.844417572021484. I found some solutions with struct.unpack but it doesn't seem to me rational. First we convert number to string then convert to float.

Is there any short way like Float.intBitsToFloat(1108304047) in Java.

The solution that i found with struct.unpack is quite long. it contains string conversion, sub string fetching, zero filling etc..

def convert_to_float(value):

    return struct.unpack("!f", hex(value)[2:].zfill(8).decode('hex'))[0]

As you can see in Java they are using structure to do the trick.

/*
 * Find the float corresponding to a given bit pattern
*/
JNIEXPORT jfloat JNICALL
Java_java_lang_Float_intBitsToFloat(JNIEnv *env, jclass unused, jint v)
{
    union {
        int i;
        float f;
    } u;
    u.i = (long)v;
    return (jfloat)u.f;
}

In Python it is not possible to do this way, hence you need to use the struct library

This module performs conversions between Python values and C structs represented as Python strings

First the number is converted to representation of long

packed_v = struct.pack('>l', b)

and then is unpacked to float

f = struct.unpack('>f', packed_v)[0]

That's similar as in Java.

def intBitsToFloat(b):
   s = struct.pack('>l', b)
   return struct.unpack('>f', s)[0]

Please correct me if I'm wrong.

ldexp and frexp decompositions for positive numbers.

If you are okay with up to 2 -16 amount of relative error, you can express both sides of the transformation using just basic arithmetic and the ld/frexp decomposition.

Note that this is much slower than the struct hack, which can be more succinctly represented as struct.unpack('f', struct.pack('I', value)) .

Here is the decomposition method.

def to_bits(x):
  man, exp = math.frexp(x)
  return int((2 * man + (exp + 125)) * 0x800000)

def from_bits(y):
  y -= 0x3e800000
  return math.ldexp(
      float(0x800000 + y & 0x7fffff) / 0x1000000, 
      (y - 0x800000) >> 23)

While the from_bits function looks scarier, it is actually nothing more than the inverse of to_bits , modified so that we only perform a single floating point division (not because of speed considerations, just because it should be the sort of mindset we have when we do need to work with machine representations of floats). Therefore, I'll focus on explaining the forward transformation.

Derivation

Recall that a (positive) IEEE 754 floating point number is represented as a tuple of a biased exponent and its mantissa. The lower 23 bits m are the mantissa, and the upper 8 bits e (minus the most significant bit, which we assume to always be zero) represent the exponent, so that

x = (1 + m / 2 23 ) * 2 e - 127

Let man' = m / 2 23 and exp' = e - 127, then 0 <= man' < 1 and exp' is an integer. Therefore

( man' + exp' + 127) * 2 23

gives the IEEE 754 representation.

On the other hand, the frexp decomposition computes a pair man, exp = frexp(x) such that man * 2 exp = x, and 0.5 <= man < 1.

A moment of thought will show that man' = 2 * man - 1 and exp' = exp - 1, therefore its IEEE machine representation is

( man' + exp' + 127) * 0x800000 = (2 * man + exp + 125) * 0x800000

Error Analysis

How much roundoff error do we expect? Well, let's assume that frexp introduces no error within its decomposition. This is unfortunately impossible, but we can relax this down the line.

The main feature is the computation 2 * man + (exp + 125) . Why? 0x800000 is an perfect power of two, and therefore a floating point multiplication of a power of two will nearly always be lossless (unless we overflow), since the FPU is just adding 23 << 23 to its machine representation (without touching the Mantissa, which is when error arise). Similarly, the multiplication 2 * man is also lossless (akin to just adding 1 << 23 to the machine representation). Furthermore, exp and 125 are integers, so (exp + 125) is also computed to exact precision.

Therefore, we are left to analyze the error behavior of m + e , where 1 <= m < 1 and |e| < 127. In the worst case, m has all 23 bits filled (corresponding to m = 2 - 2 -22 ) and e = +/- 127. Here, this addition will unfortunately clobber the 8 least significant bits of m , since it has to renormalize m (which is at the exponential range of 2 0 ) to the exponential range of 2 8 , which means losing 8 bits. However, since a mantissa has 24 significant bits, we effectively lose 2 -(24 - 8) amount of precision, which upper-bounds the error.

In a similar line of reasoning for from_bits , you can show that float(0x800000 + y & 0x7fffff) is basically computing the operation (1.0f + m), where m may have up to 23 bits of precision and it is strictly less than 1. Therefore, we're adding a precise number at the scale of 2 0 with another number at the scale of 2 -1 , so we expect a loss of one bit. This then suggests that we would incur up to 2 -22 relative error in the backwards transformation.

Both of these transformations incur very little roundoff, and if you throw in an extra multiplication into to_bits , you can also bring its error down to just 2 -22 .

Final Words

Do not do this in production.

  1. You should never have to explicitly manipulate the machine representation of a number.
  2. Even if for some ungodly reason you need to do this, you should not do something this hacky.

This is just a clever float-hack that seems fun. It's not meant to be anything more than that.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM