简体   繁体   English

有理数浮点数

[英]Rational to floating point

Consider a rational number represented by the structure below.考虑由以下结构表示的有理数。

struct rational {
    uint64_t n;
    uint64_t d;
    unsigned char sign : 1;
};

Assuming an IEEE-754 binary64 representation of double , how can the structure be converted to the nearest double with correct rounding?假设double的 IEEE-754 binary64 表示,如何将结构转换为最接近的double并进行正确的舍入? The naive method of converting n and d to double and dividing them clearly compounds rounding error.nd转换为double并将它们相除的简单方法显然会增加舍入误差。

One way of achieving the desired result is to perform the division in integer space.实现所需结果的一种方法是在整数空间中执行除法。 As standard C/C++ does not offer a 128-bit integer type (while some tool chains may offer this as an extension), this is not very efficient, but it will produce correct results.由于标准 C/C++ 不提供 128 位整数类型(而某些工具链可能会将此作为扩展提供),这不是很有效,但会产生正确的结果。

The code below generates 54 quotient bits and a remainder, one bit at at time.下面的代码生成 54 个商位和一个余数,一次一位。 The most significant 53 quotient bits represent the mantissa portion of the double result, while the least significant quotient bit and the remainder are needed for rounding to "nearest or even" according to IEEE-754.根据 IEEE-754,最高有效的 53 个商位代表double精度结果的尾数部分,而最低有效的商位和余数则需要四舍五入到“最接近或偶数”。

The code below can be compiled as either a C or a C++ program (at least it does with my tool chain).下面的代码可以编译为 C 或 C++ 程序(至少我的工具链是这样)。 It has been lightly tested.它经过了轻微的测试。 Due to the bit-wise processing, this isn't very fast, and various optimizations are possible, especially if machine-specific data types and intrinsics are employed.由于按位处理,这不是很快,并且可以进行各种优化,尤其是在使用特定于机器的数据类型和内在函数时。

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <stdint.h>

struct rational {
    uint64_t n;
    uint64_t d;
    unsigned char sign : 1;
};

double uint64_as_double (uint64_t a)
{
    double res;
#if defined (__cplusplus)
    memcpy (&res, &a, sizeof (res));
#else /* __cplusplus */
    volatile union {
        double f;
        uint64_t i;
    } cvt;
    cvt.i = a;
    res = cvt.f;
#endif /* __cplusplus */
    return res;
}

#define ADDcc(a,b,cy,t0,t1) (t0=(b), t1=(a), t0=t0+t1, cy=t0<t1, t0=t0)
#define ADDC(a,b,cy,t0,t1) (t0=(b)+cy, t1=(a), t0+t1)
#define SUBcc(a,b,cy,t0,t1) (t0=(b), t1=(a), cy=t1<t0, t1-t0)

double rational2double (struct rational a)
{
    uint64_t dividend, divisor, quot, rem, t0, t1, cy, res, expo;
    int sticky, round, odd, sign, i;

    dividend = a.n;
    divisor = a.d;
    sign = a.sign;

    /* handle special cases */
    if ((dividend == 0) && (divisor == 0)) {
        res = 0xFFF8000000000000ULL; /* NaN INDEFINITE */
    } else if (dividend == 0) {            
        res = (uint64_t)sign << 63; /* zero */
    } else if (divisor == 0) {
        res = ((uint64_t)sign << 63) | 0x7ff0000000000000ULL; /* Inf */
    } 
    /* handle normal cases */
    else {
        quot = dividend;
        rem = 0;
        expo = 0;
        /* normalize operands using 128-bit shifts */
        while (rem < divisor) {
            quot = ADDcc (quot, quot, cy, t0, t1);
            rem = ADDC (rem, rem, cy, t0, t1);
            expo--;
        }
        /* integer bit of quotient is known to be 1 */
        rem = rem - divisor;
        quot = quot + 1;
        /* generate 53 more quotient bits */
        for (i = 0; i < 53; i++) {
            quot = ADDcc (quot, quot, cy, t0, t1);
            rem = ADDC (rem, rem, cy, t0, t1);
            rem = SUBcc (rem, divisor, cy, t0, t1);
            if (cy) {
                rem = rem + divisor;
            } else {
                quot = quot + 1;
            }
        }
        /* round to nearest or even */
        sticky = rem != 0;
        round = quot & 1;
        quot = quot >> 1;
        odd = quot & 1;
        if (round && (sticky || odd)) {
            quot++;
        }
        /* compose normalized IEEE-754 double-precision number */
        res = ((uint64_t)sign << 63) + ((expo + 64 + 1023 - 1) << 52) + quot;
    }
    return uint64_as_double (res);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM