舍入为IEEE 754精度，但保留二进制格式

Question

If I convert the decimal number 3120.0005 to float (32-bit) representation, the number gets rounded down to 3120.00048828125. 如果我将十进制数3120.0005转换为浮点数（32位）表示形式，该数字将四舍五入为3120.00048828125。

Assuming we're using a fixed point number with a scale of 10^12 then 1000000000000 = 1.0 and 3120000500000000 = 3120.0005. 假设我们使用的小数位数为10 ^ 12，则1000000000000 = 1.0和3120000500000000 = 3120.0005。

What would the formula/algorithm be to round down to the nearest IEEE 754 precision to get 3120000488281250? 四舍五入到最接近的IEEE 754精度以获得3120000488281250的公式/算法是什么？ I would also need a way to get the result of rounding up (3120000732421875). 我还需要一种方法来获取舍入的结果（3120000732421875）。

Answer 1

If you divide by the decimal scaling factor, you'll find your nearest representable float. 如果用十进制比例因子除，则将找到最接近的可表示浮点数。 For rounding the other direction, std::nextafter can be used: 要舍入另一个方向，可以使用std::nextafter ：

#include <float.h>
#include <math.h>
#include <stdio.h>

long long scale_to_fixed(float f)
{
    float intf = truncf(f);
    long long result = 1000000000000LL;
    result *= (long long)intf;
    result += round((f - intf) * 1.0e12);
    return result;
}

/* not needed, always good enough to use (float)(n / 1.0e12) */
float scale_from_fixed(long long n)
{
    float result = (n % 1000000000000LL) / 1.0e12;
    result += n / 1000000000000LL;
    return result;
}

int main()
{
    long long x = 3120000500000000;
    float x_reduced = scale_from_fixed(x);
    long long y1 = scale_to_fixed(x_reduced);
    long long yfloor = y1, yceil = y1;
    if (y1 < x) {
        yceil = scale_to_fixed(nextafterf(x_reduced, FLT_MAX));
    }
    else if (y1 > x) {
        yfloor = scale_to_fixed(nextafterf(x_reduced, -FLT_MAX));
    }

    printf("%lld\n%lld\n%lld\n", yfloor, x, yceil);
}

Results: 结果：

3120000488281250 3120000488281250

3120000500000000 31200005亿

3120000732421875 3120000732421875

Answer 2

In order to handle the values as float scaled by 1e12 and compute the next larger power of two, eg "rounding up (3120000732421875)" , the key is understanding that you are looking for the next larger power of two from the 32-bit representation of x / 1.0e12 . 为了处理以1e12缩放的float值并计算下一个较大的2的幂，例如"rounding up (3120000732421875)" ，关键是要了解您正在从32位表示形式中寻找下一个较大的2的幂。 x / 1.0e12 。 While you can mathematically arrive at this value, a union between float and unsigned (or uint32_t ) provides a direct way to interpret the stored 32-bit value for the floating-point number as an unsigned value. 尽管您可以数学上得出此值，但float和unsigned （或uint32_t ）之间的并union提供了一种直接的方式来将存储的32位值浮点数解释为unsigned值。 ¹ ¹

A simple example utilizing a the union prev to hold the reduced value of x and a separate instance next holding the unsigned value ( +1 ) can be: 利用联合一个简单的例子prev持有的减小的值x和一个单独的实例next保持无符号值（ +1 ）可以是：

#include <stdio.h>
#include <inttypes.h>

int main (void) {

    uint64_t x = 3120000500000000;
    union {                         /* union between float and uint32_t */
        float f;
        uint32_t u;
    } prev = { .f = x / 1.0e12 },   /* x reduced to float, pwr of 2 as .u */
      next = { .u = prev.u + 1u };  /* 2nd union, increment pwr of 2 by 1 */

    printf ("prev : %" PRIu64 "\n   x : %" PRIu64 "\nnext : %" PRIu64 "\n", 
            (uint64_t)(prev.f * 1e12), x, (uint64_t)(next.f * 1e12));
}

Example Use/Output 使用/输出示例

$ ./bin/pwr2_prev_next
prev : 3120000488281250
   x : 3120000500000000
next : 3120000732421875

Footnotes: 脚注：

1. As an alternative, you can use a pointer to char to hold the address of the floating point type and interpret the 4-byte value stored at that location as unsigned without running afoul of C11 Standard - §6.5 Expressions (p6,7) (the "Strict Aliasing Rule" ), but the use of a union is preferred. 1.或者，您可以使用指向 char的指针来保存浮点类型的地址，并将存储在该位置的4字节值解释为unsigned而不会违反C11标准-§6.5表达式（p6,7）（ “严格的别名规则” ），但首选使用union 。

舍入为IEEE 754精度，但保留二进制格式

问题描述

2 个解决方案

解决方案1
2 已采纳 2019-04-21 04:07:02

解决方案2
1 2019-04-21 08:42:29

舍入为IEEE 754精度，但保留二进制格式

问题描述

2 个解决方案

解决方案1 2 已采纳 2019-04-21 04:07:02

解决方案2 1 2019-04-21 08:42:29

解决方案1
2 已采纳 2019-04-21 04:07:02

解决方案2
1 2019-04-21 08:42:29