[英]How to properly compare an integer and a floating-point value?
How do I compare an integer and a floating-point value the right way™ ?如何以正确的方式比较 integer 和浮点值™ ?
The builtin comparsion operators give incorrect results in some edge cases, for example:内置比较运算符在某些极端情况下会给出不正确的结果,例如:
#include <iomanip>
#include <iostream>
int main()
{
long long a = 999999984306749439;
float b = 999999984306749440.f; // This number can be represented exactly by a `float`.
std::cout << std::setprecision(1000);
std::cout << a << " < " << b << " = " << (a < b) << '\n';
// Prints `999999984306749439 < 999999984306749440 = 0`, but it should be `1`.
}
Apparently, the comparsion operators convert both operands to a same type before actually comparing them.显然,比较运算符在实际比较它们之前将两个操作数转换为相同的类型。 Here lhs gets converted to
float
, which causes a loss of precision, and leads to an incorrect result.这里 lhs 被转换为
float
,这会导致精度损失,并导致不正确的结果。
Even though I understand what's going on, I'm not sure how to work around this issue.尽管我了解发生了什么,但我不确定如何解决此问题。
Disclaimer: The example uses a float
and a long long
, but I'm looking for a generic solution that works for every combination of an integral type and a floating-point type.免责声明:该示例使用
float
和long long
,但我正在寻找适用于整数类型和浮点类型的每种组合的通用解决方案。
(Restricting this answer to positive numbers; generalisation is trivial.) (将此答案限制为正数;泛化是微不足道的。)
Get the number of bits in your exponent for the float
on your platform along with the radix.获取平台上
float
的指数中的位数以及基数。 If you have an IEEE754 32 bit float
then this is a trivial step.如果您有一个 IEEE754 32 位
float
,那么这是一个简单的步骤。
Use (1) to compute the largest non-integer value that can be stored in your float
.使用 (1) 计算可以存储在
float
中的最大非整数值。 std::numeric_limits
doesn't specify this value, annoyingly, so you need to do this yourself.烦人的
std::numeric_limits
没有指定这个值,所以你需要自己做。 For 32 bit IEEE754 you could take the easy option: 8388607.5
is the largest non-integral type float
.对于 32 位 IEEE754,您可以选择简单的选项:
8388607.5
是最大的非整数类型float
。
If your float
is less than or equal to (2), then check if it's an integer or not.如果您的
float
小于或等于 (2),则检查它是否为 integer。 If it's not an integer then you can round it appropriately so as not to invalidate the <
.如果它不是 integer 那么你可以适当地舍入它以免使
<
无效。
At this point, the float
is an integer.此时,
float
为 integer。 Check if it's within in the range of your long long
.检查它是否在您的
long long
的范围内。 If it's out of range then the result of <
is known.如果超出范围,则
<
的结果是已知的。
If you get this far, then you can safely cast your float
to a long long
, and make the comparison.如果你走到这一步,那么你可以安全地将你的
float
转换为long long
,并进行比较。
Here's what I ended up with.这就是我最终的结果。
Credit for the algorithm goes to @chux;该算法的功劳归于@chux; his approach appears to outperform the other suggestions.
他的方法似乎优于其他建议。 You can find some alternative implementations in the edit history.
您可以在编辑历史记录中找到一些替代实现。
If you can think of any improvements, suggestions are welcome.如果您能想到任何改进,欢迎提出建议。
#include <compare>
#include <cmath>
#include <limits>
#include <type_traits>
template <typename I, typename F>
std::partial_ordering compare_int_float(I i, F f)
{
if constexpr (std::is_integral_v<F> && std::is_floating_point_v<I>)
{
return 0 <=> compare_int_float(f, i);
}
else
{
static_assert(std::is_integral_v<I> && std::is_floating_point_v<F>);
static_assert(std::numeric_limits<F>::radix == 2);
// This should be exactly representable as F due to being a power of two.
constexpr F I_min_as_F = std::numeric_limits<I>::min();
// The `numeric_limits<I>::max()` itself might not be representable as F, so we use this instead.
constexpr F I_max_as_F_plus_1 = F(std::numeric_limits<I>::max()/2+1) * 2;
// Check if the constants above overflowed to infinity. Normally this shouldn't happen.
constexpr bool limits_overflow = I_min_as_F * 2 == I_min_as_F || I_max_as_F_plus_1 * 2 == I_max_as_F_plus_1;
if constexpr (limits_overflow)
{
// Manually check for special floating-point values.
if (std::isinf(f))
return f > 0 ? std::partial_ordering::less : std::partial_ordering::greater;
if (std::isnan(f))
return std::partial_ordering::unordered;
}
if (limits_overflow || f >= I_min_as_F)
{
// `f <= I_max_as_F_plus_1 - 1` would be problematic due to rounding, so we use this instead.
if (limits_overflow || f - I_max_as_F_plus_1 <= -1)
{
I f_trunc = f;
if (f_trunc < i)
return std::partial_ordering::greater;
if (f_trunc > i)
return std::partial_ordering::less;
F f_frac = f - f_trunc;
if (f_frac < 0)
return std::partial_ordering::greater;
if (f_frac > 0)
return std::partial_ordering::less;
return std::partial_ordering::equivalent;
}
return std::partial_ordering::less;
}
if (f < 0)
return std::partial_ordering::greater;
return std::partial_ordering::unordered;
}
}
If you want to experiment with it, here are a few test cases:如果你想尝试一下,这里有几个测试用例:
#include <algorithm>
#include <cmath>
#include <iomanip>
#include <iostream>
void compare_print(long long a, float b, int n = 0)
{
if (n == 0)
{
auto result = compare_int_float(a,b);
static constexpr std::partial_ordering values[] = {std::partial_ordering::less, std::partial_ordering::equivalent, std::partial_ordering::greater, std::partial_ordering::unordered};
std::cout << a << ' ' << "<=>?"[std::find(values, values+4, result) - values] << ' ' << b << '\n';
}
else
{
for (int i = 0; i < n; i++)
b = std::nextafter(b, -INFINITY);
for (int i = 0; i <= n*2; i++)
{
compare_print(a, b);
b = std::nextafter(b, INFINITY);
}
std::cout << '\n';
}
}
int main()
{
std::cout << std::setprecision(1000);
compare_print(999999984306749440,
999999984306749440.f, 2);
compare_print(999999984306749439,
999999984306749440.f, 2);
compare_print(100,
100.f, 2);
compare_print(-100,
-100.f, 2);
compare_print(0,
0.f, 2);
compare_print((long long)0x8000'0000'0000'0000,
(long long)0x8000'0000'0000'0000, 2);
compare_print(42, INFINITY);
compare_print(42, -INFINITY);
compare_print(42, NAN);
std::cout << '\n';
compare_print(1388608,
1388608.f, 2);
compare_print(12388608,
12388608.f, 2);
}
To compare a FP f
and integer i
for equality:要比较 FP
f
和integer i
是否相等:
(Code is representative and uses comparison of float
and long long
as an example) (代码有代表性,以
float
和long long
的比较为例)
If f
is a NaN, infinity, or has a fractional part (perhaps use frexp()
), f
is not equal to i
.如果
f
是 NaN、无穷大或具有小数部分(可能使用frexp()
),则f
不等于i
。
float ipart; // C++ if (frexp(f, &ipart);= 0) return not_equal, // C if (frexpf(f; &ipart) != 0) return not_equal;
Convert the numeric limits of i
into exactly representable FP values (powers of 2) near those limits.将
i
的数值限制转换为接近这些限制的可精确表示的 FP 值(2 的幂)。 ** Easy to do if we assume FP is not a rare base 10 encoding and range of double
exceeds the range on the i
. **如果我们假设 FP 不是罕见的以 10 为基数的编码并且
double
的范围超出i
的范围,则很容易做到。 Take advantage that integer limits magnitudes are or near Mersenne Number .利用 integer 限制幅度为或接近梅森数。 (Sorry example code is C-ish)
(对不起,示例代码是 C-ish)
#define FP_INT_MAX_PLUS1 ((LLONG_MAX/2 + 1)*2.0) #define FP_INT_MIN (LLONG_MIN*1.0)
Compare f
to is limits比较
f
和是极限
if (f >= FP_INT_MAX_PLUS1) return not_equal; if (f < FP_INT_MIN) return not_equal;
Convert f
to integer and compare将
f
转换为 integer 并进行比较
return (long long) f == i;
To compare a FP f
and integer i
for <
, >
, ==
or not comparable:要比较
<
、 >
、 ==
或不可比较的 FP f
和integer i
:
(Using above limits) (使用上述限制)
Test f >= lower limit
测试
f >= lower limit
if (f >= FP_INT_MIN) {
Test f <= upper limit
测试
f <= upper limit
// reform below to cope with effects of rounding // if (f <= FP_INT_MAX_PLUS1 - 1) if (f - FP_INT_MAX_PLUS1 <= -1.0) {
Convert f
to integer/fraction and compare将
f
转换为整数/分数并进行比较
// at this point `f` is in the range of `i` long long ipart = (long long) f; if (ipart < i) return f_less_than_i; if (ipart > i) return f_more_than_i; float frac = f - ipart; if (frac < 0) return f_less_than_i; if (frac > 0) return f_more_than_i; return equal; }
Handle edge cases处理边缘情况
else return f_more_than_i; } if (f < 0.0) return f_less_than_i; return not_comparable;
Simplifications possible, yet I wanted to convey the algorithm.可以进行简化,但我想传达算法。
** Additional conditional code needed to cope with non 2's complement integer encoding. **处理非 2 的补码 integer 编码所需的附加条件代码。 It is quite similar to the
MAX
code.它与
MAX
代码非常相似。
The code below works with integer data types of at most 64 bits and floating point data types of at most ieee-754 double precision accuracy.下面的代码适用于最多 64 位的 integer 数据类型和最多 ieee-754 双精度精度的浮点数据类型。 For wider data types the same idea can be used, but you'll have to adapt he code.
对于更广泛的数据类型,可以使用相同的想法,但您必须调整他的代码。 Since I'm not very familiar with C++, the code is written in C.
由于我对C++不是很熟悉,所以代码写在C中。 It shouldn't be too difficult to convert it to a C++ style code.
将其转换为 C++ 样式代码应该不会太难。 The code is branchless, which might be a performance benefit.
该代码是无分支的,这可能会带来性能优势。
#include <stdio.h>
// gcc -O3 -march=haswell cmp.c
// Assume long long int is 64 bits.
// Assume ieee-754 double precision.
int long_long_less_than_double(long long int i, double y) {
long long i_lo = i & 0x00000000FFFFFFFF; // Extract lower 32 bits.
long long i_hi = i & 0xFFFFFFFF00000000; // Extract upper 32 bits.
double x_lo = (double)i_lo; // Exact conversion to double, no rounding errors!
double x_hi = (double)i_hi; //
return ( x_lo < (y - x_hi) ); // If i is close to y then y - x_hi is exact,
// due to Sterbenz' lemma.
// i < y
// i_lo +i_hi < y
// i_lo < (y - i_hi)
// x_lo < (y - x_hi)
}
int long_long_equals_double(long long int i, double y) {
long long i_lo = i & 0x00000000FFFFFFFF;
long long i_hi = i & 0xFFFFFFFF00000000;
double x_lo = (double)i_lo;
double x_hi = (double)i_hi;
return ( x_lo == (y - x_hi) );
}
int main()
{
long long a0 = 999999984306749439;
long long a1 = 999999984306749440; // Hex number: 0x0DE0B6B000000000
long long a2 = 999999984306749441;
float b = 999999984306749440.f; // This number can be represented exactly by a `float`.
printf("%lli less_than %20.1f = %i\n", a0, b, long_long_less_than_double(a0, b)); // Implicit conversion from float to double
printf("%lli less_than %20.1f = %i\n", a1, b, long_long_less_than_double(a1, b));
printf("%lli equals %20.1f = %i\n", a0, b, long_long_equals_double(a0, b));
printf("%lli equals %20.1f = %i\n", a1, b, long_long_equals_double(a1, b));
printf("%lli equals %20.1f = %i\n\n", a2, b, long_long_equals_double(a2, b));
long long c0 = 1311693406324658687;
long long c1 = 1311693406324658688; // Hex number: 0x1234123412341200
long long c2 = 1311693406324658689;
double d = 1311693406324658688.0; // This number can be represented exactly by a `double`.
printf("%lli less_than %20.1f = %i\n", c0, d, long_long_less_than_double(c0, d));
printf("%lli less_than %20.1f = %i\n", c1, d, long_long_less_than_double(c1, d));
printf("%lli equals %20.1f = %i\n", c0, d, long_long_equals_double(c0, d));
printf("%lli equals %20.1f = %i\n", c1, d, long_long_equals_double(c1, d));
printf("%lli equals %20.1f = %i\n", c2, d, long_long_equals_double(c2, d));
return 0;
}
The idea is to split the 64 bits integer i
in 32 upper bits i_hi
and 32 lower bits i_lo
, which are converted to doubles x_hi
and x_lo
without any rounding errors.这个想法是将 64 位 integer
i
拆分为 32 个高位i_hi
和 32 个低位i_lo
,它们被转换为双精度x_hi
和x_lo
而没有任何舍入错误。 If double y
is close to x_hi
, then the floating point subtraction y - x_hi
is exact, due to Sterbenz' lemma .如果 double
y
接近x_hi
,则由于Sterbenz 引理,浮点减法y - x_hi
是精确的。 So, instead of x_lo + x_hi < y
, we can test for x_lo < (y - x_hi)
, which is more accurate!因此,我们可以测试
x_lo < (y - x_hi)
而不是x_lo + x_hi < y
,这样更准确! If double y
is not close to x_hi
then y - x_hi
is inacurate, but in that case we don't need the accuracy because then |y - x_hi|
如果 double
y
不接近x_hi
,则y - x_hi
不准确,但在这种情况下,我们不需要精度,因为|y - x_hi|
is much larger than |x_lo|
比
|x_lo|
大得多. . In other words: If
i
and y
differ much than we don't have to worry about the value of the lower 32 bits.换句话说:如果
i
和y
相差很大,我们不必担心低 32 位的值。
Output: Output:
999999984306749439 less_than 999999984306749440.0 = 1
999999984306749440 less_than 999999984306749440.0 = 0
999999984306749439 equals 999999984306749440.0 = 0
999999984306749440 equals 999999984306749440.0 = 1
999999984306749441 equals 999999984306749440.0 = 0
1311693406324658687 less_than 1311693406324658688.0 = 1
1311693406324658688 less_than 1311693406324658688.0 = 0
1311693406324658687 equals 1311693406324658688.0 = 0
1311693406324658688 equals 1311693406324658688.0 = 1
1311693406324658689 equals 1311693406324658688.0 = 0
This is how I solved it recently in opensmalltalk VM for comparing bounded integers:这就是我最近在 opensmalltalk VM 中解决它以比较有界整数的方法:
The last point may lead to a difficulty: the conversion floating point->integer might lead to an integer overflow.最后一点可能会导致一个困难:转换浮点->整数可能会导致 integer 溢出。 You must thus make sure that you use a larger integer type for that edge cases, or fallback to Bathseba's algorithm.
因此,您必须确保为这种边缘情况使用更大的 integer 类型,或者回退到 Bathseba 的算法。
In OpenSmalltalk VM, that's not a problem because SmallInteger are on 61 bits at most, so I did not attempt to solve it.在 OpenSmalltalk VM 中,这不是问题,因为 SmallInteger 最多为 61 位,所以我没有尝试解决它。
I have a Smallissimo blog entry giving additional pointers:我有一个 Smallissimo 博客条目,提供了额外的指示:
How to compare exact value of SmallInteger and Float in Smalltalk 如何在 Smalltalk 中比较 SmallInteger 和 Float 的精确值
For unbounded (arbitrarily large) integers, the comparison is performed in Integer, but there are a few tricks to accelerate the comparison.对于无界(任意大)整数,在 Integer 中执行比较,但有一些技巧可以加速比较。 This is not done in the VM but in Smalltalk code (Squeak is a good example).
这不是在 VM 中而是在 Smalltalk 代码中完成的(Squeak 就是一个很好的例子)。
Use double, not float.使用双精度,而不是浮点数。 Take the double value + 0.5.
取双精度值 + 0.5。 Truncate it by static cast to long long.
通过 static 将其截断为 long long。 Now compare the two long longs.
现在比较两个 long long。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.