从double到unsigned long long的转换失败

Question

基于从浮点数转换为自定义数值类型的问题，我想出了一种可移植的安全方法，可以将浮点数转换为整数数组，并且代码可以正常工作，但是对于某些值，当使用double从unsigned long long转换为unsigned long long ，可以安全地由unsigned long long表示的精度转换不会因编译时错误而失败，而是具有无效值，该值是有signed long long或零的最小可表示值，因此在Visual c ++ 2008，intel xe 2013和gcc 4.7上转换失败。 2。

这是代码：（请注意main函数的while循环内的第一条语句）

#ifndef CHAR_BIT
#include <limits.h>
#endif

#include <float.h>
#include <math.h>

typedef signed int          int32;
typedef signed long long    int64;
typedef unsigned int       uint32;
typedef unsigned long long uint64;

typedef float  float32;
typedef double float64;

// get size of type in bits corresponding to CHAR_BIT.
template<typename t>
struct sizeof_ex
{
    static const uint32 value = sizeof(t) * CHAR_BIT;
};

// factorial function
float64 fct(int32 i)
{
    float64 r = 1;
    do r *= i; while(--i > 1);
    return r;
}

int main()
{
    // maximum 2 to power that can be stored in uint32
    const uint32 power_2  = uint32(~0);
    // number of binary digits in power_2
    const uint32 digit_cnt = sizeof_ex<uint32>::value;
    // number of array elements that will store expanded value
    const uint32 comp_count = DBL_MAX_EXP / digit_cnt + uint32((DBL_MAX_EXP / digit_cnt) * digit_cnt < DBL_MAX_EXP);
    // array elements
    uint32 value[comp_count];

    // get factorial for 23
    float64 f = fct<float64>(23);
    // save sign for later correction
    bool sign = f < 0;
    // remove sign from float-point if exists
    if (sign) f *= -1;

    // get number of binary digits in f
    uint32 actual_digits = 0;
    frexp(f, (int32*)&actual_digits);

    // get start index in array for little-endian format
    uint32 start_index = (actual_digits / digit_cnt) + uint32((actual_digits / digit_cnt) * digit_cnt < actual_digits) - 1;

    // get all parts but the last
    while (start_index > 0)
    {
        // store current part
        // in this line the compiler fails
        value[start_index] = uint64(f / power_2);
        // exclude it from f
        f -= power_2 * float64(value[start_index]);
        // decrement index
        --start_index;
    }
    // get last part
    value[0] = uint32(f);
}

上面的转换代码将给编译器带来不同的结果，这意味着当阶乘函数的参数为20时，所有编译器均返回有效结果；当值大于20时，某些编译器将获得部分结果，而其他编译器则不会，较大，例如35则变为零。

请告诉我为什么会发生这些错误？

谢谢。

Answer 1

我认为您的转换逻辑没有任何意义。

尽管有注释，但您拥有一个名为“ power_2”的值，该值实际上不是2的幂。

您可以通过除以少于32位的位数来提取数量很大（> 64位）的位数。 显然，这样做的结果是> 32位，但是您将其存储为32位值，将其截断。 然后，将其乘以原始除数，然后从浮点数中减去。 但是，由于数字被截断了，所以您要减去的原始值要少得多，这几乎可以肯定不是您所期望的。

我认为还有更多错误，就是-您并非真的总是想要前32位，对于不是32位长的倍数的数字，您希望实际长度为32。

这里是您的代码有点懒黑客工具，做什么，我认为你正在试图做的。 请注意，可以优化pow() 。

while (start_index > 0)
{
    float64 fpow = pow(2., 32. * start_index);
    // store current part
    // in this line the compiler fails

    value[start_index] = f / fpow;
    // exclude it from f

    f -= fpow * float64(value[start_index]);
    // decrement index
    --start_index;
}

这几乎未经测试，但希望能说明我的意思。

从double到unsigned long long的转换失败

问题描述

1 个解决方案

解决方案1
1 已采纳 2013-02-28 17:08:45

从double到unsigned long long的转换失败

问题描述

1 个解决方案

解决方案1 1 已采纳 2013-02-28 17:08:45

解决方案1
1 已采纳 2013-02-28 17:08:45