简体   繁体   English

C ++ unordered_map自定义哈希函数冲突

[英]C++ unordered_map self defined hash function collision

Below code is for counting number of lines in a plane for different slope value. 下面的代码用于计算不同斜率值的平面中的线数。 It is recommended to use a pair of x-axis and y-axis positions to denote the slope of line, b/c directly calculating the division y / x will have float precision issue. 建议使用一对x轴和y轴位置表示直线的斜率,直接计算y / x除法的b / c会产生浮点精度问题。 All x and y positions are integers. 所有x和y位置都是整数。

Although Method I is working in test code, there are still something unclear to me: 尽管方法一正在测试代码中,但是我仍然不清楚:

1) For method I, pair {5, 3} and {3, 5} will have the same hash value (x ^ y), but these two lines have different slope! 1)对于方法I,对{5,3}和{3,5}将具有相同的哈希值(x ^ y),但是这两行斜率不同! Why does not it cause the problem of considering both lines the same slope? 为什么不引起考虑两条线具有相同斜率的问题? Or the hash function value only determines the slot to be hashed, while comparing the equivalence of actual pair value determines whether to count them as equal? 还是散列函数值仅确定要散列的插槽,而比较实际对值的等效性确定是否将它们视为相等?

2) Since the pair {5, 3} and {3, 5} will be hashed into the same slot, and there are lots of other similar collisions like {a, b} and {b, a}. 2)由于对{5,3}和{3,5}将被散列到同一插槽中,因此还有许多其他类似的冲突,例如{a,b}和{b,a}。 Why does the collision hash table still produces correct final result? 为什么冲突哈希表仍会产生正确的最终结果?

3) XOR for negative integers is fine, right? 3)对负整数进行XOR可以吗? Is there a better hash function we usually use here to avoid high collision? 我们通常在这里使用更好的哈希函数来避免高冲突吗?

struct hashfunc
{
    //Method I:
    size_t operator() (const pair<int,int>& l) const
    { return l.first ^ l.second; }   

    //Method II is WRONG: can NOT left shift negative int!!
    size_t operator() (const pair<int, int>& l) const {
         return l.first << 32 | l.second; 
    }
};

unordered_map< pair< int,int >, int, hashfunc> lines;

Complete absence of collisions is not achievable in any function whose output is smaller than the combined inputs. 在输出小于组合输入的任何函数中,都无法完全避免冲突。 Correctness does not depend on lack of collisions, only perfomance does. 正确性不取决于缺少碰撞,只有性能才如此。 You should get the correct results even with a hash function that returns zero all the time (try it). 即使使用始终返回零的哈希函数,也应该获得正确的结果(尝试)。

the hash function value only determines the slot to be hashed, while comparing the equivalence of actual pair value determines whether to count them as equal? 散列函数值仅确定要散列的时隙,而比较实际对值的等效性确定是否将它们相等。

Correct. 正确。

The usual method is to mash the numbers together in an unpredictable way, like 通常的方法是将数字以不可预测的方式混在一起,例如

choose distinct primes a,b,c
hash(x,y) = (a*x + b*y) % c

see eg https://en.wikipedia.org/wiki/Universal_hashing#Hashing_integers 参见例如https://en.wikipedia.org/wiki/Universal_hashing#Hashing_integers

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM