[英]Does std::hash guarantee equal hashes for “equal” floating point numbers?
Is the floating point specialisation of std::hash
(say, for double
s or float
s) reliable regarding almost-equality ? std::hash
(例如,对于double
s或float
s)的float
可靠几乎相等 ? That is, if two values (such as (1./std::sqrt(5.)/std::sqrt(5.))
and .2
) should compare equal but will not do so with the ==
operator, how will std::hash
behave? 也就是说,如果两个值(例如
(1./std::sqrt(5.)/std::sqrt(5.))
和.2
)应该比较相等但不会与==
运算符这样做,将std::hash
表现?
So, can I rely on a double
as an std::unordered_map
key to work as expected? 那么,我可以依赖
double
作为std::unordered_map
键来按预期工作吗?
I have seen " Hashing floating point values " but that asks about boost; 我已经看到了“ 哈希值浮动值 ”,但是询问了提升; I'm asking about the C++11 guarantees.
我问的是C ++ 11的保证。
std::hash
has same guarantees for all types over which it can be instantiated: if two objects are equal, their hash codes will be equal. std::hash
对可以实例化的所有类型具有相同的保证:如果两个对象相等,则它们的哈希码将相等。 Otherwise, there's a very large probability that they won't. 否则,他们不会有很大的可能性。 So you can rely on a
double
as a key in an unordered_map
to work as expected: if two doubles are not equal (as defined by ==
), they will probably have a different hash (and even if they don't, they're different keys, because unordered_map
also checks for equality). 因此,您可以依赖
double
作为unordered_map
的键来按预期工作:如果两个双精度数不相等(由==
定义),它们可能会有不同的散列(即使它们没有,它们也是'重新设置不同的密钥,因为unordered_map
也会检查是否相等)。
Obviously, if your values are the results of inexact calculations, they aren't appropriate keys for unordered_map
(nor perhaps for any map). 显然,如果您的值是不精确计算的结果,则它们不是
unordered_map
适当键(也可能不适用于任何映射)。
Multiple problems with this question: 这个问题存在多个问题:
The reason that your two expressions don't compare as equal is NOT that there are two binary expressions of 0.2, but that there is NO exact (finite) binary representation of 0.2
, or sqrt(5)
! 你的两个表达式不比较相等的原因并不是有两个二进制表达式为0.2,而是没有
0.2
或sqrt(5)
精确(有限)二进制表示! So in fact, while (1./std::sqrt(5.)/std::sqrt(5.))
and .2
should be the same algebraically, they may well not be the same in computer-precision arithmetic. 所以事实上,虽然
(1./std::sqrt(5.)/std::sqrt(5.))
和.2
应该是相同的代数,但它们在计算机精度算术中可能不一样。 (They aren't even in pen-on-paper arithmetic with finite precision. Say you are working with 10 digits after the decimal point. Write out sqrt(5)
with 10 digits and calculate your first expression. It will not be .2
.) (它们甚至不具有有限精度的纸笔算术。假设您正在使用小数点后的10位数。用10位数写出
sqrt(5)
并计算您的第一个表达式。它不会是.2
。)
Of course you have a sensible concept of two numbers being close. 当然,你有两个数字接近的明智概念。 In fact you have at least two: One absolute (
|ab| < eps
) , one relative. 事实上,你至少有两个:一个绝对(
|ab| < eps
),一个亲戚。 But that doesn't translate into sensible hashes. 但这并没有转化为明智的哈希。 If you want all numbers within
eps
of each other to have the same hash, then 1, 1+eps, 1+2*eps, ...
would all have the same hash and therefore, ALL numbers would have the same hash. 如果您希望彼此
eps
内的所有数字具有相同的散列,那么1, 1+eps, 1+2*eps, ...
都将具有相同的散列,因此,所有数字将具有相同的散列。 That is a valid, but useless hash function. 这是一个有效但无用的哈希函数。 But it is the only one that satisfies your requirement of mapping nearby values to the same hash!
但它是唯一一个满足您将附近值映射到相同哈希的要求!
There is no rigorous concept of "almost equality". 没有严格的“几乎平等”的概念。 So behavior can't be guaranteed in principle.
所以原则上不能保证行为。 If you want to define your own concept of "almost equal" and construct a hash function such that two "almost equal" floats have the same hash, you can.
如果你想定义你自己的“几乎相等”的概念并构造一个哈希函数,使得两个“几乎相等”的浮点数具有相同的哈希值,你可以。 But then it will only be true for your particular notion of "almost equal" floats.
但是,只有你的“几乎相同”浮动的特定概念才会出现这种情况。
Behind the default hashing of an unordered_map
there is a std::hash
struct which provides the operator()
to compute the hash of a given value. 在
unordered_map
的默认散列后面有一个std::hash
结构,它提供operator()
来计算给定值的散列。
A set of default specializations of this templates is available, including std::hash<float>
and std::hash<double>
. 可以使用此模板的一组默认特化,包括
std::hash<float>
和std::hash<double>
。
On my machine (LLVM+clang) these are defined as 在我的机器上(LLVM + clang),这些被定义为
template <>
struct hash<float> : public __scalar_hash<float>
{
size_t operator()(float __v) const _NOEXCEPT
{
// -0.0 and 0.0 should return same hash
if (__v == 0)
return 0;
return __scalar_hash<float>::operator()(__v);
}
};
where __scalar_hash
is defined as: 其中
__scalar_hash
定义为:
template <class _Tp>
struct __scalar_hash<_Tp, 0> : public unary_function<_Tp, size_t>
{
size_t operator()(_Tp __v) const _NOEXCEPT
{
union
{
_Tp __t;
size_t __a;
} __u;
__u.__a = 0;
__u.__t = __v;
return __u.__a;
}
};
Where basically the hash is built by setting a value of an union to the source value and then getting just a piece which is large as a size_t
. 基本上通过将联合的值设置为源值然后获得一个大的作为
size_t
的片段来构建散列。
So you get some padding or you get your value truncated, but that doesn't really matter because as you can see the raw bits of the number are used to compute the hash, this means that it works exactly as the ==
operator. 所以你得到一些填充或者你的值被截断,但这并不重要,因为你可以看到数字的原始位用于计算哈希值,这意味着它与
==
运算符完全相同。 Two floating numbers, to have the same hash (excluding collision given by truncation), must be the same value. 两个浮点数具有相同的哈希值(不包括由截断给出的冲突),必须是相同的值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.