一个好的hash function为矢量图

Question

I have some vector of integer that I would like to store efficiently in a unordered_map in c++11 my question is this:我有一些 integer 的向量，我想将其有效地存储在 c++11 的 unordered_map 中，我的问题是：

How do I best store these and optimize for .find queries?我如何最好地存储这些并优化.find查询？

I came up with the following hasher:我想出了以下散列器：

class uint32_vector_hasher {
public:
  std::size_t operator()(std::vector<uint32_t> const& vec) const {
    std::size_t ret = 0;
    for(auto& i : vec) {
      ret ^= std::hash<uint32_t>()(i);
    }
    return ret;
  }
};

and then store the objects in an unordered_map I do however have a couple of questions然后将对象存储在unordered_map中，但是我有几个问题

how often does the hash get calculated, only one, some random number or times? hash 多久计算一次，只有一个，一些随机数或次数？
Would it make sense to create a wrapper object with == and hash functions to make memorize the hash and avoid it being calculated more than once?使用==和 hash 函数创建包装器 object 以记住 hash 并避免它被多次计算是否有意义？

When profiling I've noticed that a rather large amount of my cpu time is spend doing lookups on the unordered maps, this is not exactly optimal:(在进行性能分析时，我注意到我的 CPU 时间有相当多的时间花在了对无序地图的查找上，这并不是最佳的:(

Answer 1

So, when not wanting to use boost, Michael Blurr's comment led to the following hash function implementation:因此，当不想使用 boost 时，Michael Blurr 的评论导致了以下哈希函数实现：

std::size_t operator()(std::vector<uint32_t> const& vec) const {
  std::size_t seed = vec.size();
  for(auto& i : vec) {
    seed ^= i + 0x9e3779b9 + (seed << 6) + (seed >> 2);
  }
  return seed;
}

Seems to work.似乎工作。

Edit: see's answer is a little bit slower, but indeed yields a better hash distribution.编辑： see的答案有点慢，但确实产生了更好的散列分布。 I'd go with that one.我会和那个一起去的。

Answer 2

请尽可能与专家联系： http ： //www.boost.org/doc/libs/release/doc/html/hash/reference.html#boost.hash_combine

Answer 3

The hash function in the currently highest voted answer by HolKann results in a high collision rate for numerous vectors that all contain elements from a small continuous distribution. HolKann 目前投票率最高的答案中的哈希函数导致大量向量的冲突率很高，这些向量都包含来自小的连续分布的元素。

To combat this, bits of each element are distributed evenly (algorithm taken from Thomas Mueller's answer ).为了解决这个问题，每个元素的位均匀分布（算法取自Thomas Mueller 的答案）。

std::size_t operator()(std::vector<uint32_t> const& vec) const {
  std::size_t seed = vec.size();
  for(auto x : vec) {
    x = ((x >> 16) ^ x) * 0x45d9f3b;
    x = ((x >> 16) ^ x) * 0x45d9f3b;
    x = (x >> 16) ^ x;
    seed ^= x + 0x9e3779b9 + (seed << 6) + (seed >> 2);
  }
  return seed;
}

Answer 4

boost::hash_combine is good enough but not particularly good boost::hash_combine足够好但不是特别好

HolKann's answer is good enough, but I'd recommend using a good hash for each entry and then combining them. HolKann 的回答已经足够好了，但我建议为每个条目使用一个好的散列，然后将它们组合起来。 The problem is std::hash is not a good hash and boost::hash_combine is not strong enough to make up for that.问题是std::hash不是一个好的散列，而boost::hash_combine的强度不足以弥补这一点。

template<typename T>
T xorshift(const T& n,int i){
  return n^(n>>i);
}

uint32_t hash(const uint32_t& v) {
  uint32_t p = 0x55555555ul; // pattern of alternating 0 and 1
  uint32_t c = 3423571495ul; // random uneven integer constant; 
  return c*xorshift(p*xorshift(n,16),16);
}

// if c++20 rotl is not available:
template <typename T,typename S>
typename std::enable_if<std::is_unsigned<T>::value,T>::type
constexpr rotl(const T n, const S i){
  const T m = (std::numeric_limits<T>::digits-1);
  const T c = i&m;
  return (n<<c)|(n>>((T(0)-c)&m)); // this is usually recognized by the compiler to mean rotation, also c++20 now gives us rotl directly
}

class uint32_vector_hasher {
public:
  std::size_t operator()(std::vector<uint32_t> const& vec) const {
    std::size_t ret = 0;
    for(auto& i : vec) {
      ret = rotl(ret,11)^hash(i);
    }
    return ret;
  }
};

Answer 5

I tried see's answer to solve a leet code problem.我尝试查看解决 leet 代码问题的答案。 But for some inputs, the function would overflow ints.但是对于某些输入，function 会溢出整数。 So, I reverted to your approach.所以，我恢复了你的方法。 But, your function causes lots of collisions if you have elements like: {0}, {0, 0}, {0, 0, 0} , etc. because hash of int is the number itself and all these hash to 0.但是，如果您有以下元素，您的 function 会导致很多冲突： {0}, {0, 0}, {0, 0, 0}等，因为 hash 的 int 是数字本身，所有这些 hash 到 0。

I tweaked it slightly to include the index to reduce the collision rate:我稍微调整了它以包含索引以降低冲突率：

struct hash {
    std::size_t operator()(std::vector<int> const& vec) const {
        std::hash<uint32_t> h;
        std::size_t ret = vec.size();
        for(auto& i : vec) {
            ret ^= h(i) | i;
        }
        return ret;
    }
};

I am just Oring the hash with the index so {0}, {0, 0}, {0, 0, 0} produce different hashes.我只是用索引对 hash 进行 Oring，因此{0}, {0, 0}, {0, 0, 0}会产生不同的哈希值。 Its a very bad hash function but it works for my purposes:P这是一个非常糟糕的 hash function 但它适用于我的目的：P

一个好的hash function为矢量图

问题描述

4 个解决方案

解决方案1
43 2014-11-30 18:47:59

解决方案2
16 已采纳 2013-12-11 05:46:02

解决方案3
3 2022-05-01 04:10:49

解决方案4
1 2021-08-28 08:52:51

解决方案5
1 2022-12-10 03:18:36

一个好的hash function为矢量图

问题描述

4 个解决方案

解决方案1 43 2014-11-30 18:47:59

解决方案2 16 已采纳 2013-12-11 05:46:02

解决方案3 3 2022-05-01 04:10:49

解决方案4 1 2021-08-28 08:52:51

解决方案5 1 2022-12-10 03:18:36

解决方案1
43 2014-11-30 18:47:59

解决方案2
16 已采纳 2013-12-11 05:46:02

解决方案3
3 2022-05-01 04:10:49

解决方案4
1 2021-08-28 08:52:51

解决方案5
1 2022-12-10 03:18:36