简体   繁体   English

从二进制转换为浮点

[英]Convert from binary to floating point

I'm doing some exercises for Computer Science university and one of them is about converting an int array of 64 bit into it's double-precision floating point value. 我正在为计算机科学大学做一些练习,其中一个是将64位的int数组转换成双精度浮点值。

Understanding the first bit, the sign +/-, is quite easy. 理解第一位,符号+/-,非常简单。 Same for the exponent, as well as we know that the bias is 1023. 对于指数也是如此,我们知道偏差是1023。

We are having problems with the significand . 我们有意义的问题。 How can I calculate it? 我怎么计算呢?

In the end, I would like to obtain the real numbers that the bits meant. 最后,我想获得这些位的实际数字。

You could just load the bits into an unsigned integer of the same size as a double, take the address of that and cast it to a void* which you then cast to a double* and dereference. 您可以将这些位加载到与double相同大小的无符号整数中,取出该地址并将其转换为void* ,然后将其转换为double*并取消引用。

Of course, this might be "cheating" if you really are supposed to parse the floating point standard, but this is how I would have solved the problem given the parameters you've stated so far. 当然,如果你真的应该解析浮点标准,这可能是“作弊”,但考虑到你到目前为止所说的参数,这就是我解决问题的方法。

computing the significand of the given 64 bit is quite easy. 计算给定64位的有效位数非常容易。

according to the wiki article using the IEEE 754, the significand is made up the first 53 bits (from bit 0 to bit 52). 根据使用IEEE 754的维基文章 ,有效数据由前53位(从位0到位52)组成。 Now if you want to convert number having like 67 bits to your 64 bits value, it would be rounded by setting the trailing 64th bits of your value to 1, even if it was one before... because of the other 3 bits: 现在,如果要将具有67位的数字转换为64位值,则可以通过将值的尾随第64位设置为1来舍入,即使它是之前的...因为其他3位:

11110000 11110010 11111 becomes 11110000 11110011 after the rounding of the last byte; 11110000 11110010 11111在最后一个字节的舍入后变为11110000 11110011 ;

therefore the there is no need to store the 53th bits because it has always a value a one. 因此,不需要存储第53位,因为它总是值为1。 that's why you only store in 52 bits in the significand instead of 53. 这就是为什么你只存储有效数字中的52位而不是53位。

now to compute it, you just need to target the bit range of the significand [bit(1) - bit(52)] -bit(0) is always 1- and use it . 现在要计算它,你只需要定位有效数的位范围[bit(1) - bit(52)] -bit(0)始终为1并使用它。

int index_signf = 1; // starting at 1, not 0
int significand_length = 52;
int byteArray[53]; // array containing the bits of the significand

double significand_endValue = 0;
for( ; index_signf <= significand_length ; index_signf ++)
{
    significand_endValue += byteArray[index_signf] * (pow(2,-(index_signf)));
}

significand_endValue += 1; 

Now you just have to fill byteArray accordlingly before computing it, using function like that: 现在你只需要在计算它之前完全填充byteArray ,使用这样的函数:

int* getSignificandBits(int* array64bits){

    //returned array

    int significandBitsArray[53];
    // indexes++
    int i_array64bits = 0; 
    int i_significandBitsArray=1;
    //set the first bit = 1

    significandBitsArray[0] = 1;



    // fill it  
    for(i_significandBitsArray=1, i_array64bits = (63 - 1); i_array64bits >= (64 - 52); i_array64bits--, i_significandBitsArray ++)
        significandBitsArray[i_significandBitsArray] = array64bits[i_array64bits];

    return significandBitsArray;
}

If you have a byte representation of an object you can copy the bytes into the storage of a variable of the right type to convert it. 如果您有对象的字节表示,则可以将字节复制到正确类型的变量的存储中以进行转换。

double convert_to_double(uint64_t x) {
    double result;
    mempcy(&result, &x, sizeof(x));
    return result;
}

You will often see code like *(double *)&x to do the conversion, but whereas in practice this will always work it's undefined behavior in C. 您经常会看到像*(double *)&x这样的代码来进行转换,但实际上这将在C中使用它的未定义行为。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM