简体   繁体   English

为什么我们在哈希算法中还有其他的存储桶数组?

[英]Why do we have an additional array of buckets in a hash algorithm?

I've been stepping through the implementation for a Hashset for the .net framework, I am slightly confused by the implementation of it. 我一直在逐步完成.net框架的哈希集的实现,但对其实现感到有些困惑。 Here is the Contains method: 这是Contains方法:

    private int[] m_buckets;
    private Slot[] m_slots;

public bool Contains(T item) {
        if (m_buckets != null) {
            int hashCode = InternalGetHashCode(item);
            // see note at "HashSet" level describing why "- 1" appears in for loop
            for (int i = m_buckets[hashCode % m_buckets.Length] - 1; i >= 0; i = m_slots[i].next) {
                if (m_slots[i].hashCode == hashCode && m_comparer.Equals(m_slots[i].value, item)) {
                    return true;
                }
            }
        }
        // either m_buckets is null or wasn't found
        return false;
    }


internal struct Slot {
        internal int hashCode;      // Lower 31 bits of hash code, -1 if unused
        internal T value;
        internal int next;          // Index of next entry, -1 if last
    }

I understand the first part, get the hash code of the item. 我了解第一部分,获取该项目的哈希码。 Next a loop is started and a suitable index is generated from the hashcode. 接下来,开始循环,并从哈希码中生成合适的索引。 But then it uses this index to retrieve a value from an array of integers, which it then uses to check if the hashcodes of the values and the values themselves are the same. 但是随后,它使用此索引从整数数组中检索值,然后将其用于检查值的哈希码和值本身是否相同。 Why is this? 为什么是这样? Also, I cannot get my head around the .next property, why is it necessary to store this information? 另外,我无法理解.next属性,为什么必须存储此信息?

Several objects may have the same value for hashCode % m_buckets.Length even if they have distinct hashCode values. 几个对象的hashCode%m_buckets.Length可能具有相同的值,即使它们具有不同的hashCode值也是如此。 Distinct objects may also have the same hashCode value (even though it is unlikely). 不同的对象也可能具有相同的hashCode值(即使不太可能)。

This is resolved by storing all the objects with the same value for hashCode % m_buckets.Length in an array, and then searching for the appropriate element in that array. 通过将具有相同值的hashCode%m_buckets.Length的所有对象存储在数组中,然后在该数组中搜索适当的元素,可以解决此问题。 The reason it compares both the hashCode value and the objects themselves is that the comparison of hashCode is faster than the comparison of the objects themselves. 它同时比较hashCode值和对象本身的原因是, hashCode的比较比对象本身的比较要快。 By doing a cheap check on the hashcodes first we can avoid doing an expensive check on the objects. 通过首先对哈希码进行廉价检查,我们可以避免对对象进行昂贵的检查。

The next values are stored so that it is possible to enumerate the elements that hash to a single value. 存储下一个值,以便可以枚举散列为单个值的元素。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM