简体   繁体   English

std :: unordered_map的哈希值

[英]Hash value for a std::unordered_map

According to the standard there's no support for containers (let alone unordered ones) in the std::hash class. 根据标准, std::hash类中不支持容器(更不用说无序容器)了。 So I wonder how to implement that. 所以我想知道如何实现这一点。 What I have is: 我有的是:

std::unordered_map<std::wstring, std::wstring> _properties;
std::wstring _class;

I thought about iterating the entries, computing the individual hashes for keys and values (via std::hash<std::wstring> ) and concatenate the results somehow. 我想过迭代条目,计算键和值的各个哈希值(通过std::hash<std::wstring> )并以某种方式连接结果。

What would be a good way to do that and does it matter if the order in the map is not defined? 如果没有定义地图中的顺序,那么这样做的好方法是什么?

Note: I don't want to use boost. 注意:我不想使用boost。

A simple XOR was suggested, so it would be like this: 提出了一个简单的异或,所以它会是这样的:

size_t MyClass::GetHashCode()
{
  std::hash<std::wstring> stringHash;
  size_t mapHash = 0;
  for (auto property : _properties)
    mapHash ^= stringHash(property.first) ^ stringHash(property.second);

    return ((_class.empty() ? 0 : stringHash(_class)) * 397) ^ mapHash;
}

?

I'm really unsure if that simple XOR is enough. 我真的不确定这个简单的XOR是否足够。

Response 响应

If by enough, you mean whether or not your function is injective, the answer is No. The reasoning is that the set of all hash values your function can output has cardinality 2^64, while the space of your inputs is much larger. 如果足够,你的意思是你的函数是否是单射的,答案是否定的。推理是你的函数可以输出的所有散列值的集合的基数为2 ^ 64,而输入的空间大得多。 However, this is not really important, because you can't have an injective hash function given the nature of your inputs. 但是,这并不重要,因为根据输入的性质,你不能有一个单射散列函数。 A good hash function has these qualities: 一个好的哈希函数具有以下特性:

  • It's not easily invertible. 它不容易颠倒。 Given the output k, it's not computationally feasible within the lifetime of the universe to find m such that h(m) = k. 给定输出k,在宇宙的生命周期内找到m使得h(m)= k在计算上是不可行的。
  • The range is uniformly distributed over the output space. 范围均匀分布在输出空间上。
  • It's hard to find two inputs m and m' such that h(m) = h(m') 很难找到两个输入m和m',使得h(m)= h(m')

Of course, the extents of these really depend on whether you want something that's cryptographically secure, or you want to take some arbitrary chunk of data and just send it some arbitrary 64-bit integer. 当然,这些的范围实际上取决于您是否想要一些加密安全的东西,或者您想要获取一些任意数据块并且只是发送一些任意的64位整数。 If you want something cryptographically secure, writing it yourself is not a good idea. 如果你想要一些加密安全的东西,自己编写它并不是一个好主意。 In that case, you'd also need the guarantee that the function is sensitive to small changes in the input. 在这种情况下,您还需要保证函数对输入中的微小变化敏感。 The std::hash function object is not required to be cryptographically secure. std::hash函数对象不需要加密安全。 It exists for use cases isomorphic to hash tables. 它存在用于哈希表同构的用例。 CPP Rerefence says: CPP Rerefence说:

For two different parameters k1 and k2 that are not equal, the probability that std::hash<Key>()(k1) == std::hash<Key>()(k2) should be very small, approaching 1.0/std::numeric_limits<size_t>::max() . 对于不相等的两个不同参数k1k2std::hash<Key>()(k1) == std::hash<Key>()(k2)的概率应该非常小,接近1.0/std::numeric_limits<size_t>::max()

I'll show below how your current solution doesn't really guarantee this. 我将在下面说明您当前的解决方案并不能真正保证这一点。

Collisions 碰撞

I'll give you a few of my observations on a variant of your solution (I don't know what your _class member is). 我会给你一些关于你的解决方案变体的观察(我不知道你的_class成员是什么)。

std::size_t hash_code(const std::unordered_map<std::string, std::string>& m) {
    std::hash<std::string> h;
    std::size_t result = 0;
    for (auto&& p : m) {
        result ^= h(p.first) ^ h(p.second);
    }
    return result;
}

It's easy to generate collisions. 生成碰撞很容易。 Consider the following maps: 请考虑以下地图:

std::unordered_map<std::string, std::string> container0;
std::unordered_map<std::string, std::string> container1;
container0["123"] = "456";
container1["456"] = "123";
std::cout << hash_code(container0) << '\n';
std::cout << hash_code(container1) << '\n';

On my machine, compiling with g++ 4.9.1, this outputs: 在我的机器上,使用g ++ 4.9.1进行编译,输出:

1225586629984767119
1225586629984767119

The question as to whether this matters or not arises. 关于这是否重要的​​问题出现了。 What's relevant is how often you're going to have maps where keys and values are reversed. 与此相关的是,您有多少时间可以获得键和值相反的地图。 These collisions will occur between any two maps in which the sets of keys and values are the same. 这些碰撞将发生在任何两个映射之间,其中键和值集是相同的。

Order of Iteration 迭代次序

Two unordered_map instances having exactly the same key-value pairs will not necessarily have the same order of iteration. 具有完全相同键值对的两个unordered_map实例不一定具有相同的迭代次序。 CPP Rerefence says: CPP Rerefence说:

For two parameters k1 and k2 that are equal, std::hash<Key>()(k1) == std::hash<Key>()(k2) . 对于两个相等的参数k1k2std::hash<Key>()(k1) == std::hash<Key>()(k2)

This is a trivial requirement for a hash function. 这是哈希函数的一个简单要求。 Your solution avoids this because the order of iteration doesn't matter since XOR is commutative. 您的解决方案避免了这种情况,因为迭代的顺序无关紧要,因为XOR是可交换的。

A Possible Solution 可能的解决方案

If you don't need something that's cryptographically secure, you can modify your solution slightly to kill the symmetry. 如果您不需要加密安全的东西,您可以稍微修改您的解决方案以消除对称性。 This approach is okay in practice for hash tables and the like. 对于散列表等,这种方法在实践中是可行的。 This solution is also independent of the fact that order in an unordered_map is undefined. 此解决方案还独立于unordered_map中的unordered_map未定义的事实。 It uses the same property your solution used (Commutativity of XOR). 它使用您的解决方案使用的相同属性(XOR的交换)。

std::size_t hash_code(const std::unordered_map<std::string, std::string>& m) {
    const std::size_t prime = 19937;
    std::hash<std::string> h;
    std::size_t result = 0;
    for (auto&& p : m) {
        result ^= prime*h(p.first) + h(p.second);
    }
    return result;
}

All you need in a hash function in this case is a way to map a key-value pair to an arbitrary good hash value, and a way to combine the hashes of the key-value pairs using a commutative operation. 在这种情况下,哈希函数中所需要的只是将键值对映射到任意良好哈希值的方法,以及使用可交换操作组合键值对的哈希的方法。 That way, order does not matter. 这样,顺序无关紧要。 In the example hash_code I wrote, the key-value pair hash value is just a linear combination of the hash of the key and the hash of the value. 在我写的示例hash_code ,键值对散列值只是键的散列和值的散列的线性组合。 You can construct something a bit more intricate, but there's no need for that. 你可以构造一些更复杂的东西,但没有必要。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM