简体   繁体   English

舍入为IEEE 754精度,但保留二进制格式

[英]Round to IEEE 754 precision but keep binary format

If I convert the decimal number 3120.0005 to float (32-bit) representation, the number gets rounded down to 3120.00048828125. 如果我将十进制数3120.0005转换为浮点数(32位)表示形式,该数字将四舍五入为3120.00048828125。

Assuming we're using a fixed point number with a scale of 10^12 then 1000000000000 = 1.0 and 3120000500000000 = 3120.0005. 假设我们使用的小数位数为10 ^ 12,则1000000000000 = 1.0和3120000500000000 = 3120.0005。

What would the formula/algorithm be to round down to the nearest IEEE 754 precision to get 3120000488281250? 四舍五入到最接近的IEEE 754精度以获得3120000488281250的公式/算法是什么? I would also need a way to get the result of rounding up (3120000732421875). 我还需要一种方法来获取舍入的结果(3120000732421875)。

If you divide by the decimal scaling factor, you'll find your nearest representable float. 如果用十进制比例因子除,则将找到最接近的可表示浮点数。 For rounding the other direction, std::nextafter can be used: 要舍入另一个方向,可以使用std::nextafter

#include <float.h>
#include <math.h>
#include <stdio.h>

long long scale_to_fixed(float f)
{
    float intf = truncf(f);
    long long result = 1000000000000LL;
    result *= (long long)intf;
    result += round((f - intf) * 1.0e12);
    return result;
}

/* not needed, always good enough to use (float)(n / 1.0e12) */
float scale_from_fixed(long long n)
{
    float result = (n % 1000000000000LL) / 1.0e12;
    result += n / 1000000000000LL;
    return result;
}

int main()
{
    long long x = 3120000500000000;
    float x_reduced = scale_from_fixed(x);
    long long y1 = scale_to_fixed(x_reduced);
    long long yfloor = y1, yceil = y1;
    if (y1 < x) {
        yceil = scale_to_fixed(nextafterf(x_reduced, FLT_MAX));
    }
    else if (y1 > x) {
        yfloor = scale_to_fixed(nextafterf(x_reduced, -FLT_MAX));
    }

    printf("%lld\n%lld\n%lld\n", yfloor, x, yceil);
}

Results: 结果:

3120000488281250 3120000488281250

3120000500000000 31200005亿

3120000732421875 3120000732421875

In order to handle the values as float scaled by 1e12 and compute the next larger power of two, eg "rounding up (3120000732421875)" , the key is understanding that you are looking for the next larger power of two from the 32-bit representation of x / 1.0e12 . 为了处理以1e12缩放的float值并计算下一个较大的2的幂,例如"rounding up (3120000732421875)" ,关键是要了解您正在从32位表示形式中寻找下一个较大的2的幂。 x / 1.0e12 While you can mathematically arrive at this value, a union between float and unsigned (or uint32_t ) provides a direct way to interpret the stored 32-bit value for the floating-point number as an unsigned value. 尽管您可以数学上得出此值,但floatunsigned (或uint32_t )之间的并union提供了一种直接的方式来将存储的32位值浮点数解释为unsigned值。 1 1

A simple example utilizing a the union prev to hold the reduced value of x and a separate instance next holding the unsigned value ( +1 ) can be: 利用联合一个简单的例子prev持有的减小的值x和一个单独的实例next保持无符号值( +1 )可以是:

#include <stdio.h>
#include <inttypes.h>

int main (void) {

    uint64_t x = 3120000500000000;
    union {                         /* union between float and uint32_t */
        float f;
        uint32_t u;
    } prev = { .f = x / 1.0e12 },   /* x reduced to float, pwr of 2 as .u */
      next = { .u = prev.u + 1u };  /* 2nd union, increment pwr of 2 by 1 */

    printf ("prev : %" PRIu64 "\n   x : %" PRIu64 "\nnext : %" PRIu64 "\n", 
            (uint64_t)(prev.f * 1e12), x, (uint64_t)(next.f * 1e12));
}

Example Use/Output 使用/输出示例

$ ./bin/pwr2_prev_next
prev : 3120000488281250
   x : 3120000500000000
next : 3120000732421875

Footnotes: 脚注:

1. As an alternative, you can use a pointer to char to hold the address of the floating point type and interpret the 4-byte value stored at that location as unsigned without running afoul of C11 Standard - §6.5 Expressions (p6,7) (the "Strict Aliasing Rule" ), but the use of a union is preferred. 1.或者,您可以使用指向 char指针来保存浮点类型的地址,并将存储在该位置的4字节值解释为unsigned而不会违反C11标准-§6.5表达式(p6,7)“严格的别名规则” ),但首选使用union

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM