简体   繁体   English

8字节双精度数作为uint64_t的二进制字符串

[英]8 byte double as binary string to uint64_t

Im looking for way to convert 8 byte double to uint64_t. 我正在寻找将8字节双精度转换为uint64_t的方法。 I cant use any standard library becouse of that there is only 4byte double in my solution. 我不能使用任何标准库,因为在我的解决方案中只有4byte的两倍。

This conversion should convert 10987789.5 to 10987789 as int. 此转换应将10987789.5转换为10987789作为int。

Conversion that I use right now: 我现在使用的转换:

uint64_t binDoubleToUint64_t(char *bit){
    uint8_t  i, j;
    uint64_t fraction;
    for(i=0; i<64; i++)
       bit[i]-='0';

    uint16_t exponent = bit[1] ? 1 : 0; 

    j = 0;
    for(i=9; i>0;i--)
       exponent += bit[i+2] * int_pow(2, j++);

    bit[11] = bit[1];
    fraction = 0;
    j=0;

    for(i=0; i < exponent; i++){
        fraction = fraction << 1;
        if(bit[11+i])
        fraction |= 1 << 1;
    }
    return fraction;
}

But this give me wrong answers. 但这给了我错误的答案。 While I try to convert double 10225203.0 (0x416380c660000000) it returns 10225202 (should 10225203) 当我尝试转换双精度10225203.0(0x416380c660000000)时,它将返回10225202(应为10225203)

Can you read the bit values straight in as a uint64_t . 您能以uint64_t直接读取位值吗? Then the code might look something like this: 然后,代码可能看起来像这样:

uint64_t binDoubleToUint64_t (uint64_t in) {
  if (!(in & 0x4000000000000000) || in & 0x800000000000000000) {
    /* If the exponent isn't big enough to give a value greater than 1
     * or our number is negative return 0. 
     */
    return 0;
  }

  uint32_t exponent = ((in & 0x7FF0000000000000) >> 52) - 1023;

  // get the mantissa including the imagined bit.
  uint64_t mantissa = (in & 0xFFFFFFFFFFFFF) | 0x10000000000000;

  // Now we just need to work out how much to shift the mantissa by.
  /* You may notice that the top bit of the mantissa is actually at 53 once 
     you put the imagined bit back in, mantissaTopBit is really 
     floor(log2(mantissa)) which is 52 (i.e. the power of 2 of the position 
     that the top bit is in). I couldn't think of a good name for this, so just
     imagine that you started counting from 0 instead of 1 if you like!
  */
  uint32_t mantissaTopBit = 52;

  if (mantissaTopBit > exponent)
    return mantissa >> mantissaTopBit - exponent;
  else {
    if (exponent - mantissaTopBit > 12) {
       //You're in trouble as your double doesn't fit into an uint64_t
    }

    return mantissa << exponent - mantissaTopBit;
  }
}

This has been written from my memory of the floating point spec (I haven't checked all the values) so you may want to check the values given. 这是从我对浮点规范的记忆中写的(我尚未检查所有值),因此您可能要检查给定的值。 It works for your examples, but you may want to check that I've put the right number of '0's in everywhere. 它适用于您的示例,但是您可能需要检查一下是否在所有地方都输入了正确的数字“ 0”。

/*
*  write a double to a stream in ieee754 format regardless of host
*  encoding.
*  x - number to write
*  fp - the stream
*  bigendian - set to write big bytes first, else write little bytes first
*  Returns: 0 or EOF on error
*  Notes: different NaN types and negative zero not preserved.
*         if the number is too big to represent it will become infinity
*         if it is too small to represent it will become zero.
*/
int fwriteieee754(double x, FILE *fp, int bigendian)
{
    int shift;
    unsigned long sign, exp, hibits, hilong, lowlong;
    double fnorm, significand;
    int expbits = 11;
    int significandbits = 52;

    /* zero (can't handle signed zero) */
    if (x == 0)
    {
        hilong = 0;
        lowlong = 0;
        goto writedata;
    }
    /* infinity */
    if (x > DBL_MAX)
    {
        hilong = 1024 + ((1 << (expbits - 1)) - 1);
        hilong <<= (31 - expbits);
        lowlong = 0;
        goto writedata;
    }
    /* -infinity */
    if (x < -DBL_MAX)
    {
        hilong = 1024 + ((1 << (expbits - 1)) - 1);
        hilong <<= (31 - expbits);
        hilong |= (1 << 31);
        lowlong = 0;
        goto writedata;
    }
    /* NaN - dodgy because many compilers optimise out this test, but
    *there is no portable isnan() */
    if (x != x)
    {
        hilong = 1024 + ((1 << (expbits - 1)) - 1);
        hilong <<= (31 - expbits);
        lowlong = 1234;
        goto writedata;
    }

    /* get the sign */
    if (x < 0) { sign = 1; fnorm = -x; }
    else { sign = 0; fnorm = x; }

    /* get the normalized form of f and track the exponent */
    shift = 0;
    while (fnorm >= 2.0) { fnorm /= 2.0; shift++; }
    while (fnorm < 1.0) { fnorm *= 2.0; shift--; }

    /* check for denormalized numbers */
    if (shift < -1022)
    {
        while (shift < -1022) { fnorm /= 2.0; shift++; }
        shift = -1023;
    }
    /* out of range. Set to infinity */
    else if (shift > 1023)
    {
        hilong = 1024 + ((1 << (expbits - 1)) - 1);
        hilong <<= (31 - expbits);
        hilong |= (sign << 31);
        lowlong = 0;
        goto writedata;
    }
    else
        fnorm = fnorm - 1.0; /* take the significant bit off mantissa */

    /* calculate the integer form of the significand */
    /* hold it in a  double for now */

    significand = fnorm * ((1LL << significandbits) + 0.5f);


    /* get the biased exponent */
    exp = shift + ((1 << (expbits - 1)) - 1); /* shift + bias */

    /* put the data into two longs (for convenience) */
    hibits = (long)(significand / 4294967296);
    hilong = (sign << 31) | (exp << (31 - expbits)) | hibits;
    x = significand - hibits * 4294967296;
    lowlong = (unsigned long)(significand - hibits * 4294967296);

writedata:
    /* write the bytes out to the stream */
    if (bigendian)
    {
        fputc((hilong >> 24) & 0xFF, fp);
        fputc((hilong >> 16) & 0xFF, fp);
        fputc((hilong >> 8) & 0xFF, fp);
        fputc(hilong & 0xFF, fp);

        fputc((lowlong >> 24) & 0xFF, fp);
        fputc((lowlong >> 16) & 0xFF, fp);
        fputc((lowlong >> 8) & 0xFF, fp);
        fputc(lowlong & 0xFF, fp);
    }
    else
    {
        fputc(lowlong & 0xFF, fp);
        fputc((lowlong >> 8) & 0xFF, fp);
        fputc((lowlong >> 16) & 0xFF, fp);
        fputc((lowlong >> 24) & 0xFF, fp);

        fputc(hilong & 0xFF, fp);
        fputc((hilong >> 8) & 0xFF, fp);
        fputc((hilong >> 16) & 0xFF, fp);
        fputc((hilong >> 24) & 0xFF, fp);
    }
    return ferror(fp);
}

You can trivially modify this function to do what you want. 您可以修改此功能以执行所需的操作。

https://github.com/MalcolmMcLean/ieee754 https://github.com/MalcolmMcLean/ieee754

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM