简体   繁体   English

将无符号整数转换为在python中浮动

[英]convert unsigned integer to float in python

I wrote a socket server that reads data from some devices. 我写了一个套接字服务器,可以从某些设备读取数据。 After reading data binary shift is applied on bytes. 读取数据后,将二进制移位应用于字节。 After that i get an integer value for instance 1108304047 and i want to convert this number to IEEE 754 float 35.844417572021484. 之后,我得到一个整数值,例如1108304047 ,我想将此数字转换为IEEE 754浮点数35.844417572021484。 I found some solutions with struct.unpack but it doesn't seem to me rational. 我用struct.unpack找到了一些解决方案,但在我看来这并不合理。 First we convert number to string then convert to float. 首先,我们将数字转换为字符串,然后转换为浮点数。

Is there any short way like Float.intBitsToFloat(1108304047) in Java. 有没有像Java中的Float.intBitsToFloat(1108304047)这样的简短方法。

The solution that i found with struct.unpack is quite long. 我用struct.unpack找到的解决方案很长。 it contains string conversion, sub string fetching, zero filling etc.. 它包含字符串转换,子字符串提取,零填充等。

def convert_to_float(value):

    return struct.unpack("!f", hex(value)[2:].zfill(8).decode('hex'))[0]

As you can see in Java they are using structure to do the trick. 如您在Java中所看到的,它们正在使用结构来完成技巧。

/*
 * Find the float corresponding to a given bit pattern
*/
JNIEXPORT jfloat JNICALL
Java_java_lang_Float_intBitsToFloat(JNIEnv *env, jclass unused, jint v)
{
    union {
        int i;
        float f;
    } u;
    u.i = (long)v;
    return (jfloat)u.f;
}

In Python it is not possible to do this way, hence you need to use the struct library 在Python中不可能这样做,因此您需要使用struct

This module performs conversions between Python values and C structs represented as Python strings 此模块在Python值和以Python字符串表示的C结构之间执行转换

First the number is converted to representation of long 首先将数字转换为long表示形式

packed_v = struct.pack('>l', b)

and then is unpacked to float 然后解压后float

f = struct.unpack('>f', packed_v)[0]

That's similar as in Java. 这与Java中的相似。

def intBitsToFloat(b):
   s = struct.pack('>l', b)
   return struct.unpack('>f', s)[0]

Please correct me if I'm wrong. 如果我错了,请纠正我。

ldexp and frexp decompositions for positive numbers. ldexpfrexp分解为正数。

If you are okay with up to 2 -16 amount of relative error, you can express both sides of the transformation using just basic arithmetic and the ld/frexp decomposition. 如果您可以接受2 -16的相对误差,则可以仅使用基本算术和ld/frexp分解来表示转换的两面。

Note that this is much slower than the struct hack, which can be more succinctly represented as struct.unpack('f', struct.pack('I', value)) . 请注意,这比struct hack慢得多,后者可以更简洁地表示为struct.unpack('f', struct.pack('I', value))

Here is the decomposition method. 这是分解方法。

def to_bits(x):
  man, exp = math.frexp(x)
  return int((2 * man + (exp + 125)) * 0x800000)

def from_bits(y):
  y -= 0x3e800000
  return math.ldexp(
      float(0x800000 + y & 0x7fffff) / 0x1000000, 
      (y - 0x800000) >> 23)

While the from_bits function looks scarier, it is actually nothing more than the inverse of to_bits , modified so that we only perform a single floating point division (not because of speed considerations, just because it should be the sort of mindset we have when we do need to work with machine representations of floats). 尽管from_bits函数看起来更from_bits ,但实际上只不过是to_bits的反to_bits ,已修改,因此我们仅执行单个浮点除法(不是to_bits速度方面的考虑,只是因为它应该是我们做时的思维定势)需要使用浮点数的机器表示)。 Therefore, I'll focus on explaining the forward transformation. 因此,我将重点介绍正向变换。

Derivation 推导

Recall that a (positive) IEEE 754 floating point number is represented as a tuple of a biased exponent and its mantissa. 回想一下,(正)IEEE 754浮点数表示为有偏指数及其尾数的元组。 The lower 23 bits m are the mantissa, and the upper 8 bits e (minus the most significant bit, which we assume to always be zero) represent the exponent, so that 较低的23位m是尾数,较高的8位e (减去最高有效位,我们假定始终为零)表示指数,因此

x = (1 + m / 2 23 ) * 2 e - 127 x =(1 + m / 2 23 )* 2 e -127

Let man' = m / 2 23 and exp' = e - 127, then 0 <= man' < 1 and exp' is an integer. man' = m / 2 23exp' = e -127,则0 <= man' <1并且exp'是整数。 Therefore 因此

( man' + exp' + 127) * 2 23 man' + exp' + 127)* 2 23

gives the IEEE 754 representation. 给出IEEE 754表示形式。

On the other hand, the frexp decomposition computes a pair man, exp = frexp(x) such that man * 2 exp = x, and 0.5 <= man < 1. 另一方面, frexp分解计算出一对man, exp = frexp(x)使得man * 2 exp = x,并且0.5 <= man <1。

A moment of thought will show that man' = 2 * man - 1 and exp' = exp - 1, therefore its IEEE machine representation is 片刻的思考将表明man' = 2 * man -1和exp' = exp -1,因此其IEEE机器表示为

( man' + exp' + 127) * 0x800000 = (2 * man + exp + 125) * 0x800000 man' + exp' + 127)* 0x800000 =(2 * man + exp + 125)* 0x800000

Error Analysis 误差分析

How much roundoff error do we expect? 我们期望多少舍入误差? Well, let's assume that frexp introduces no error within its decomposition. 好吧,假设frexp在其分解过程中没有引入错误。 This is unfortunately impossible, but we can relax this down the line. 不幸的是,这是不可能的,但是我们可以放松这一点。

The main feature is the computation 2 * man + (exp + 125) . 主要特征是计算2 * man + (exp + 125) Why? 为什么? 0x800000 is an perfect power of two, and therefore a floating point multiplication of a power of two will nearly always be lossless (unless we overflow), since the FPU is just adding 23 << 23 to its machine representation (without touching the Mantissa, which is when error arise). 0x800000是2的完美幂,因此2的浮点乘法几乎总是无损的(除非我们溢出),因为FPU只是在其机器表示中添加23 << 23 (而不触碰尾数,这是发生错误的时间)。 Similarly, the multiplication 2 * man is also lossless (akin to just adding 1 << 23 to the machine representation). 同样,乘积2 * man也是无损的(类似于将1 << 23加到机器表示上)。 Furthermore, exp and 125 are integers, so (exp + 125) is also computed to exact precision. 此外, exp和125是整数,因此(exp + 125)也可以精确计算。

Therefore, we are left to analyze the error behavior of m + e , where 1 <= m < 1 and |e| 因此,我们需要分析m + e的错误行为,其中1 <= m <1并且| e | < 127. In the worst case, m has all 23 bits filled (corresponding to m = 2 - 2 -22 ) and e = +/- 127. Here, this addition will unfortunately clobber the 8 least significant bits of m , since it has to renormalize m (which is at the exponential range of 2 0 ) to the exponential range of 2 8 , which means losing 8 bits. <127。在最坏的情况下, m填充了所有23位(对应于m = 2-2 -22 ),并且e = +/-127。在这里,不幸的是,此加法将破坏m的8个最低有效位,因为必须将m (在2 0的指数范围内)重新归一化为2 8的指数范围,这意味着丢失8位。 However, since a mantissa has 24 significant bits, we effectively lose 2 -(24 - 8) amount of precision, which upper-bounds the error. 但是,由于尾数具有24个有效位,因此我们实际上损失了2- (24-8)的精度,这使误差超出了上限。

In a similar line of reasoning for from_bits , you can show that float(0x800000 + y & 0x7fffff) is basically computing the operation (1.0f + m), where m may have up to 23 bits of precision and it is strictly less than 1. Therefore, we're adding a precise number at the scale of 2 0 with another number at the scale of 2 -1 , so we expect a loss of one bit. from_bits的类似推理行中,您可以证明float(0x800000 + y & 0x7fffff)基本上是在计算运算(1.0f + m),其中m可能具有23位的精度,并且严格小于1因此,我们将以2 0的小数位数添加一个精确数字,然后以2 -1的小数位数添加一个精确数字,因此我们希望损失1位。 This then suggests that we would incur up to 2 -22 relative error in the backwards transformation. 这表明在反向转换中我们将产生2 -22的相对误差。

Both of these transformations incur very little roundoff, and if you throw in an extra multiplication into to_bits , you can also bring its error down to just 2 -22 . 这两个转换几乎不需要舍入,并且如果将额外的乘法放入to_bits ,则还可以将其误差降低到2 -22

Final Words 最后的话

Do not do this in production. 不要在生产中这样做。

  1. You should never have to explicitly manipulate the machine representation of a number. 您永远不必显式操作数字的机器表示。
  2. Even if for some ungodly reason you need to do this, you should not do something this hacky. 即使出于某种不敬虔的原因而需要执行此操作,也不应做这样的事。

This is just a clever float-hack that seems fun. 这只是一个看起来很有趣的聪明的浮动hack。 It's not meant to be anything more than that. 但这不意味着更多。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM