如何正确比较 integer 和浮点值？

Question

How do I compare an integer and a floating-point value the right way™ ?如何以正确的方式比较 integer 和浮点值™ ？

The builtin comparsion operators give incorrect results in some edge cases, for example:内置比较运算符在某些极端情况下会给出不正确的结果，例如：

#include <iomanip>
#include <iostream>

int main()
{
    long long a = 999999984306749439;
    float     b = 999999984306749440.f; // This number can be represented exactly by a `float`.

    std::cout << std::setprecision(1000);
    std::cout << a << " < " << b << " = " << (a < b) << '\n';
    // Prints `999999984306749439 < 999999984306749440 = 0`, but it should be `1`.
}

Apparently, the comparsion operators convert both operands to a same type before actually comparing them.显然，比较运算符在实际比较它们之前将两个操作数转换为相同的类型。 Here lhs gets converted to float , which causes a loss of precision, and leads to an incorrect result.这里 lhs 被转换为float ，这会导致精度损失，并导致不正确的结果。

Even though I understand what's going on, I'm not sure how to work around this issue.尽管我了解发生了什么，但我不确定如何解决此问题。

^{Disclaimer: The example uses a float and a long long , but I'm looking for a generic solution that works for every combination of an integral type and a floating-point type.}^{免责声明：该示例使用float和long long ，但我正在寻找适用于整数类型和浮点类型的每种组合的通用解决方案。}

Answer 1

(Restricting this answer to positive numbers; generalisation is trivial.) （将此答案限制为正数；泛化是微不足道的。）

Get the number of bits in your exponent for the float on your platform along with the radix.获取平台上float的指数中的位数以及基数。 If you have an IEEE754 32 bit float then this is a trivial step.如果您有一个 IEEE754 32 位float ，那么这是一个简单的步骤。
Use (1) to compute the largest non-integer value that can be stored in your float .使用 (1) 计算可以存储在float中的最大非整数值。 std::numeric_limits doesn't specify this value, annoyingly, so you need to do this yourself.烦人的std::numeric_limits没有指定这个值，所以你需要自己做。 For 32 bit IEEE754 you could take the easy option: 8388607.5 is the largest non-integral type float .对于 32 位 IEEE754，您可以选择简单的选项： 8388607.5是最大的非整数类型float 。
If your float is less than or equal to (2), then check if it's an integer or not.如果您的float小于或等于 (2)，则检查它是否为 integer。 If it's not an integer then you can round it appropriately so as not to invalidate the < .如果它不是 integer 那么你可以适当地舍入它以免使<无效。
At this point, the float is an integer.此时， float为 integer。 Check if it's within in the range of your long long .检查它是否在您的long long的范围内。 If it's out of range then the result of < is known.如果超出范围，则<的结果是已知的。
If you get this far, then you can safely cast your float to a long long , and make the comparison.如果你走到这一步，那么你可以安全地将你的float转换为long long ，并进行比较。

Answer 2

Here's what I ended up with.这就是我最终的结果。

Credit for the algorithm goes to @chux;该算法的功劳归于@chux； his approach appears to outperform the other suggestions.他的方法似乎优于其他建议。 You can find some alternative implementations in the edit history.您可以在编辑历史记录中找到一些替代实现。

If you can think of any improvements, suggestions are welcome.如果您能想到任何改进，欢迎提出建议。

#include <compare>
#include <cmath>
#include <limits>
#include <type_traits>

template <typename I, typename F>
std::partial_ordering compare_int_float(I i, F f)
{
    if constexpr (std::is_integral_v<F> && std::is_floating_point_v<I>)
    {
        return 0 <=> compare_int_float(f, i);
    }
    else
    {
        static_assert(std::is_integral_v<I> && std::is_floating_point_v<F>);
        static_assert(std::numeric_limits<F>::radix == 2);
        
        // This should be exactly representable as F due to being a power of two.
        constexpr F I_min_as_F = std::numeric_limits<I>::min();
        
        // The `numeric_limits<I>::max()` itself might not be representable as F, so we use this instead.
        constexpr F I_max_as_F_plus_1 = F(std::numeric_limits<I>::max()/2+1) * 2;

        // Check if the constants above overflowed to infinity. Normally this shouldn't happen.
        constexpr bool limits_overflow = I_min_as_F * 2 == I_min_as_F || I_max_as_F_plus_1 * 2 == I_max_as_F_plus_1;
        if constexpr (limits_overflow)
        {
            // Manually check for special floating-point values.
            if (std::isinf(f))
                return f > 0 ? std::partial_ordering::less : std::partial_ordering::greater;
            if (std::isnan(f))
                return std::partial_ordering::unordered;
        }

        if (limits_overflow || f >= I_min_as_F)
        {
            // `f <= I_max_as_F_plus_1 - 1` would be problematic due to rounding, so we use this instead.
            if (limits_overflow || f - I_max_as_F_plus_1 <= -1)
            {
                I f_trunc = f;
                if (f_trunc < i)
                    return std::partial_ordering::greater;
                if (f_trunc > i)
                    return std::partial_ordering::less;
            
                F f_frac = f - f_trunc;
                if (f_frac < 0)
                    return std::partial_ordering::greater;
                if (f_frac > 0)
                    return std::partial_ordering::less;
                    
                return std::partial_ordering::equivalent;
            }

            return std::partial_ordering::less;
        }
        
        if (f < 0)
            return std::partial_ordering::greater;
        
        return std::partial_ordering::unordered;
    }
}

If you want to experiment with it, here are a few test cases:如果你想尝试一下，这里有几个测试用例：

#include <algorithm>
#include <cmath>
#include <iomanip>
#include <iostream> 

void compare_print(long long a, float b, int n = 0)
{
    if (n == 0)
    {
        auto result = compare_int_float(a,b);
        static constexpr std::partial_ordering values[] = {std::partial_ordering::less, std::partial_ordering::equivalent, std::partial_ordering::greater, std::partial_ordering::unordered};
        std::cout << a << ' ' << "<=>?"[std::find(values, values+4, result) - values] << ' ' << b << '\n';
    }
    else
    {
        for (int i = 0; i < n; i++)
            b = std::nextafter(b, -INFINITY);
            
        for (int i = 0; i <= n*2; i++)
        {
            compare_print(a, b);
            b = std::nextafter(b, INFINITY);
        }
        
        std::cout << '\n';
    }
}

int main()
{    
    std::cout << std::setprecision(1000);
    
    compare_print(999999984306749440,
                  999999984306749440.f, 2);
                  
    compare_print(999999984306749439,
                  999999984306749440.f, 2);
                  
    compare_print(100,
                  100.f, 2);
    
    compare_print(-100,
                  -100.f, 2);
                  
    compare_print(0,
                  0.f, 2);
                                    
    compare_print((long long)0x8000'0000'0000'0000,
                  (long long)0x8000'0000'0000'0000, 2);
                                    
    compare_print(42, INFINITY);
    compare_print(42, -INFINITY);
    compare_print(42, NAN);
    std::cout << '\n';

    compare_print(1388608,
                  1388608.f, 2);
    
    compare_print(12388608,
                  12388608.f, 2);
}

^{(run the code)} ^{（运行代码）}

Answer 3

To compare a FP f and integer i for equality:要比较 FP f和integer i是否相等：

(Code is representative and uses comparison of float and long long as an example) （代码有代表性，以float和long long的比较为例）

If f is a NaN, infinity, or has a fractional part (perhaps use frexp() ), f is not equal to i .如果f是 NaN、无穷大或具有小数部分（可能使用frexp() ），则f不等于i 。
```
 float ipart; // C++ if (frexp(f, &ipart);= 0) return not_equal, // C if (frexpf(f; &ipart) != 0) return not_equal;
```
Convert the numeric limits of i into exactly representable FP values (powers of 2) near those limits.将i的数值限制转换为接近这些限制的可精确表示的 FP 值（2 的幂）。 ^** Easy to do if we assume FP is not a rare base 10 encoding and range of double exceeds the range on the i . ^**如果我们假设 FP 不是罕见的以 10 为基数的编码并且double的范围超出i的范围，则很容易做到。 Take advantage that integer limits magnitudes are or near Mersenne Number .利用 integer 限制幅度为或接近梅森数。 (Sorry example code is C-ish) （对不起，示例代码是 C-ish）
```
 #define FP_INT_MAX_PLUS1 ((LLONG_MAX/2 + 1)*2.0) #define FP_INT_MIN (LLONG_MIN*1.0)
```

Compare f to is limits比较f和是极限

if (f >= FP_INT_MAX_PLUS1) return not_equal; if (f < FP_INT_MIN) return not_equal;

Convert f to integer and compare将f转换为 integer 并进行比较
```
return (long long) f == i;
```

To compare a FP f and integer i for < , > , == or not comparable:要比较< 、 > 、 ==或不可比较的 FP f和integer i ：

(Using above limits) （使用上述限制）

Test f >= lower limit测试f >= lower limit
```
 if (f >= FP_INT_MIN) {
```

Test f <= upper limit测试f <= upper limit

 // reform below to cope with effects of rounding // if (f <= FP_INT_MAX_PLUS1 - 1) if (f - FP_INT_MAX_PLUS1 <= -1.0) {

Convert f to integer/fraction and compare将f转换为整数/分数并进行比较

 // at this point `f` is in the range of `i` long long ipart = (long long) f; if (ipart < i) return f_less_than_i; if (ipart > i) return f_more_than_i; float frac = f - ipart; if (frac < 0) return f_less_than_i; if (frac > 0) return f_more_than_i; return equal; }

Handle edge cases处理边缘情况

 else return f_more_than_i; } if (f < 0.0) return f_less_than_i; return not_comparable;

Simplifications possible, yet I wanted to convey the algorithm.可以进行简化，但我想传达算法。

^** Additional conditional code needed to cope with non 2's complement integer encoding. ^**处理非 2 的补码 integer 编码所需的附加条件代码。 It is quite similar to the MAX code.它与MAX代码非常相似。

Answer 4

The code below works with integer data types of at most 64 bits and floating point data types of at most ieee-754 double precision accuracy.下面的代码适用于最多 64 位的 integer 数据类型和最多 ieee-754 双精度精度的浮点数据类型。 For wider data types the same idea can be used, but you'll have to adapt he code.对于更广泛的数据类型，可以使用相同的想法，但您必须调整他的代码。 Since I'm not very familiar with C++, the code is written in C.由于我对C++不是很熟悉，所以代码写在C中。 It shouldn't be too difficult to convert it to a C++ style code.将其转换为 C++ 样式代码应该不会太难。 The code is branchless, which might be a performance benefit.该代码是无分支的，这可能会带来性能优势。

#include <stdio.h>
// gcc -O3 -march=haswell cmp.c
// Assume long long int is 64 bits.
// Assume ieee-754 double precision.
int long_long_less_than_double(long long int i, double y) {
    long long i_lo = i & 0x00000000FFFFFFFF;   // Extract lower 32 bits.
    long long i_hi = i & 0xFFFFFFFF00000000;   // Extract upper 32 bits.
    double x_lo = (double)i_lo;                // Exact conversion to double, no rounding errors!
    double x_hi = (double)i_hi;                // 
    return ( x_lo < (y - x_hi) );              // If i is close to y then y - x_hi is exact,
                                               // due to Sterbenz' lemma.
    // i < y
    // i_lo +i_hi < y      
    // i_lo < (y - i_hi)
    // x_lo < (y - x_hi)
}

int long_long_equals_double(long long int i, double y) {
    long long i_lo = i & 0x00000000FFFFFFFF;   
    long long i_hi = i & 0xFFFFFFFF00000000;   
    double x_lo = (double)i_lo;                    
    double x_hi = (double)i_hi;                    
    return ( x_lo == (y - x_hi) );                  
}                                                  


int main()
{
    long long a0 = 999999984306749439;
    long long a1 = 999999984306749440;    // Hex number: 0x0DE0B6B000000000
    long long a2 = 999999984306749441;
    float     b = 999999984306749440.f;   // This number can be represented exactly by a `float`.

    printf("%lli less_than %20.1f = %i\n", a0, b, long_long_less_than_double(a0, b));  // Implicit conversion from float to double
    printf("%lli less_than %20.1f = %i\n", a1, b, long_long_less_than_double(a1, b));

    printf("%lli equals    %20.1f = %i\n", a0, b, long_long_equals_double(a0, b));
    printf("%lli equals    %20.1f = %i\n", a1, b, long_long_equals_double(a1, b));
    printf("%lli equals    %20.1f = %i\n\n", a2, b, long_long_equals_double(a2, b));


    long long c0 = 1311693406324658687;
    long long c1 = 1311693406324658688;   // Hex number: 0x1234123412341200
    long long c2 = 1311693406324658689; 
    double     d = 1311693406324658688.0; // This number can be represented exactly by a `double`.

    printf("%lli less_than %20.1f = %i\n", c0, d, long_long_less_than_double(c0, d));
    printf("%lli less_than %20.1f = %i\n", c1, d, long_long_less_than_double(c1, d));

    printf("%lli equals    %20.1f = %i\n", c0, d, long_long_equals_double(c0, d));
    printf("%lli equals    %20.1f = %i\n", c1, d, long_long_equals_double(c1, d));
    printf("%lli equals    %20.1f = %i\n", c2, d, long_long_equals_double(c2, d));


    return 0;
}

The idea is to split the 64 bits integer i in 32 upper bits i_hi and 32 lower bits i_lo , which are converted to doubles x_hi and x_lo without any rounding errors.这个想法是将 64 位 integer i拆分为 32 个高位i_hi和 32 个低位i_lo ，它们被转换为双精度x_hi和x_lo而没有任何舍入错误。 If double y is close to x_hi , then the floating point subtraction y - x_hi is exact, due to Sterbenz' lemma .如果 double y接近x_hi ，则由于Sterbenz 引理，浮点减法y - x_hi是精确的。 So, instead of x_lo + x_hi < y , we can test for x_lo < (y - x_hi) , which is more accurate!因此，我们可以测试x_lo < (y - x_hi)而不是x_lo + x_hi < y ，这样更准确！ If double y is not close to x_hi then y - x_hi is inacurate, but in that case we don't need the accuracy because then |y - x_hi|如果 double y不接近x_hi ，则y - x_hi不准确，但在这种情况下，我们不需要精度，因为|y - x_hi| is much larger than |x_lo|比|x_lo|大得多. . In other words: If i and y differ much than we don't have to worry about the value of the lower 32 bits.换句话说：如果i和y相差很大，我们不必担心低 32 位的值。

Output: Output：

    999999984306749439 less_than 999999984306749440.0 = 1
    999999984306749440 less_than 999999984306749440.0 = 0
    999999984306749439 equals    999999984306749440.0 = 0
    999999984306749440 equals    999999984306749440.0 = 1
    999999984306749441 equals    999999984306749440.0 = 0

    1311693406324658687 less_than 1311693406324658688.0 = 1
    1311693406324658688 less_than 1311693406324658688.0 = 0
    1311693406324658687 equals    1311693406324658688.0 = 0
    1311693406324658688 equals    1311693406324658688.0 = 1
    1311693406324658689 equals    1311693406324658688.0 = 0

Answer 5

This is how I solved it recently in opensmalltalk VM for comparing bounded integers:这就是我最近在 opensmalltalk VM 中解决它以比较有界整数的方法：

convert the integer as floating point (values is rounded, thus maybe inexact)将 integer 转换为浮点数（值已四舍五入，因此可能不精确）
compare if both float values are equal比较两个浮点值是否相等
if they are not, there is no ambiguity whatever the rounding error, thus perform the comparison of floating point values and return the result如果不是，则无论舍入误差如何都没有歧义，因此执行浮点值的比较并返回结果
if they are equal, then convert the floating point as integer and perform comparison of integer values如果它们相等，则将浮点转换为 integer 并执行 integer 值的比较

The last point may lead to a difficulty: the conversion floating point->integer might lead to an integer overflow.最后一点可能会导致一个困难：转换浮点->整数可能会导致 integer 溢出。 You must thus make sure that you use a larger integer type for that edge cases, or fallback to Bathseba's algorithm.因此，您必须确保为这种边缘情况使用更大的 integer 类型，或者回退到 Bathseba 的算法。

In OpenSmalltalk VM, that's not a problem because SmallInteger are on 61 bits at most, so I did not attempt to solve it.在 OpenSmalltalk VM 中，这不是问题，因为 SmallInteger 最多为 61 位，所以我没有尝试解决它。

I have a Smallissimo blog entry giving additional pointers:我有一个 Smallissimo 博客条目，提供了额外的指示：

How to compare exact value of SmallInteger and Float in Smalltalk 如何在 Smalltalk 中比较 SmallInteger 和 Float 的精确值

For unbounded (arbitrarily large) integers, the comparison is performed in Integer, but there are a few tricks to accelerate the comparison.对于无界（任意大）整数，在 Integer 中执行比较，但有一些技巧可以加速比较。 This is not done in the VM but in Smalltalk code (Squeak is a good example).这不是在 VM 中而是在 Smalltalk 代码中完成的（Squeak 就是一个很好的例子）。

Answer 6

Use double, not float.使用双精度，而不是浮点数。 Take the double value + 0.5.取双精度值 + 0.5。 Truncate it by static cast to long long.通过 static 将其截断为 long long。 Now compare the two long longs.现在比较两个 long long。

如何正确比较 integer 和浮点值？

问题描述

6 个解决方案

解决方案1
4 2019-11-06 16:05:06

解决方案2
4 已采纳 2019-11-06 20:17:36

解决方案3
3 2019-11-08 00:32:27

解决方案4
2 2019-11-10 11:31:01

解决方案5
1 2019-11-07 06:56:02

解决方案6
-3 2019-11-06 16:06:39

如何正确比较 integer 和浮点值？

问题描述

6 个解决方案

解决方案1 4 2019-11-06 16:05:06

解决方案2 4 已采纳 2019-11-06 20:17:36

解决方案3 3 2019-11-08 00:32:27

解决方案4 2 2019-11-10 11:31:01

解决方案5 1 2019-11-07 06:56:02

解决方案6 -3 2019-11-06 16:06:39

解决方案1
4 2019-11-06 16:05:06

解决方案2
4 已采纳 2019-11-06 20:17:36

解决方案3
3 2019-11-08 00:32:27

解决方案4
2 2019-11-10 11:31:01

解决方案5
1 2019-11-07 06:56:02

解决方案6
-3 2019-11-06 16:06:39