简体   繁体   English

什么哈希算法.net使用? 小丑怎么样?

[英]What hash algorithm does .net utilise? What about java?

Regarding the HashTable (and subsequent derivatives of such) does anyone know what hashing algorithm .net and Java utilise? 关于HashTable(以及后来的衍生产品),有谁知道什么是哈希算法.net和Java?

Are List and Dictionary both direct descandents of Hashtable? 列表和词典都是Hashtable的直接后代吗?

The hash function is not built into the hash table; 哈希函数没有内置到哈希表中; the hash table invokes a method on the key object to compute the hash. 哈希表调用密钥对象上的方法来计算哈希值。 So, the hash function varies depending on the type of key object. 因此,散列函数根据密钥对象的类型而变化。

In Java, a List is not a hash table (that is, it doesn't extend the Map interface). 在Java中, List不是哈希表(也就是说,它不扩展Map接口)。 One could implement a List with a hash table internally (a sparse list, where the list index is the key into the hash table), but such an implementation is not part of the standard Java library. 可以在内部实现带有哈希表的List (稀疏列表,其中列表索引是哈希表的关键),但是这样的实现不是标准Java库的一部分。

I know nothing about .NET but I'll attempt to speak for Java. 我对.NET一无所知,但我会尝试代表Java。

In Java, the hash code is ultimately a combination of the code returned by a given object's hashCode() function, and a secondary hash function inside the HashMap/ConcurrentHashMap class (interestingly, the two use different functions). 在Java中,哈希代码最终是给定对象的hashCode()函数返回的代码和HashMap / ConcurrentHashMap类中的辅助哈希函数的组合(有趣的是,两者使用不同的函数)。 Note that Hashtable and Dictionary (the precursors to HashMap and AbstractMap) are obsolete classes. 请注意,Hashtable和Dictionary(HashMap和AbstractMap的前身)是过时的类。 And a list is really just "something else". 列表实际上只是“别的东西”。

As an example, the String class constructs a hash code by repeatedly multiplying the current code by 31 and adding in the next character. 例如,String类通过将当前代码重复乘以31并添加下一个字符来构造哈希代码。 See my article on how the String hash function works for more information. 有关更多信息,请参阅有关String哈希函数如何工作的文章。 Numbers generally use "themselves" as the hash code; 数字通常使用“自己”作为哈希码; other classes, eg Rectangle, that have a combination of fields often use a combination of the String technique of multiplying by a small prime number and adding in, but add in the various field values. 其他类,例如Rectangle,具有字段组合,通常使用String技术的组合乘以小素数并加入,但添加各种字段值。 (Choosing a prime number means you're unlikely to get "accidental interactions" between certain values and the hash code width, since they don't divide by anything.) (选择一个素数意味着你不可能在某些值和哈希码宽度之间得到“偶然的交互”,因为它们不会被任何东西分开。)

Since the hash table size-- ie the number of "buckets" it has-- is a power of two, a bucket number is derived from the hash code essentially by lopping off the top bits until the hash code is in range. 由于散列表大小 - 即它具有的“桶”的数量 - 是2的幂,因此基本上通过砍掉最高位直到散列码在范围内来从散列码导出桶号。 The secondary hash function protects against hash functions where all or most of the randomness is in those top bits, by "spreading the bits around" so that some of the randomness ends up in the bottom bits and doesn't get lopped off. 辅助散列函数防止散列函数,其中所有或大部分随机性位于那些顶部位,通过“扩展位”,使得一些随机性最终位于底部位并且不会被丢弃。 The String hash code would actually work fairly well without this mixing, but user-created hash codes may not work quite so well. 如果没有这种混合,String哈希代码实际上可以很好地工作,但是用户创建的哈希代码可能效果不是很好。 Note that if two different hash codes resolve to the same bucket number, Java's HashMap implementations use the "chaining" technique-- ie they create a linked list of entries in each bucket. 请注意,如果两个不同的哈希码解析为相同的桶号,则Java的HashMap实现使用“链接”技术 - 即它们在每个桶中创建条目的链接列表。 It's thus important for hash codes to have a good degree of randomness so that items don't cluster into a particular range of buckets. 因此,哈希码具有良好的随机性非常重要,因此项目不会聚集到特定范围的桶中。 (However, even with a perfect hash function, you will still by law of averages expect some chaining to occur.) (但是,即使具有完美的散列函数,您仍然可以通过平均法则预期会发生一些链接。)

Hash code implementations shouldn't be a mystery. 散列码实现应该不是一个谜。 You can look at the hashCode() source for any class you choose. 您可以查看您选择的任何类的hashCode()源代码。

While looking for the same answer myself, I found this in .net's reference source @ http://referencesource.microsoft.com . 在我自己寻找相同的答案时,我在.net的参考源@ http://referencesource.microsoft.com中找到了这个。

     /*
      Implementation Notes:
      The generic Dictionary was copied from Hashtable's source - any bug 
      fixes here probably need to be made to the generic Dictionary as well.
      This Hashtable uses double hashing.  There are hashsize buckets in the 
      table, and each bucket can contain 0 or 1 element.  We a bit to mark
      whether there's been a collision when we inserted multiple elements
      (ie, an inserted item was hashed at least a second time and we probed 
      this bucket, but it was already in use).  Using the collision bit, we
      can terminate lookups & removes for elements that aren't in the hash
      table more quickly.  We steal the most significant bit from the hash code
      to store the collision bit.

      Our hash function is of the following form:

      h(key, n) = h1(key) + n*h2(key)

      where n is the number of times we've hit a collided bucket and rehashed
      (on this particular lookup).  Here are our hash functions:

      h1(key) = GetHash(key);  // default implementation calls key.GetHashCode();
      h2(key) = 1 + (((h1(key) >> 5) + 1) % (hashsize - 1));

      The h1 can return any number.  h2 must return a number between 1 and
      hashsize - 1 that is relatively prime to hashsize (not a problem if 
      hashsize is prime).  (Knuth's Art of Computer Programming, Vol. 3, p. 528-9)
      If this is true, then we are guaranteed to visit every bucket in exactly
      hashsize probes, since the least common multiple of hashsize and h2(key)
      will be hashsize * h2(key).  (This is the first number where adding h2 to
      h1 mod hashsize will be 0 and we will search the same bucket twice).

      We previously used a different h2(key, n) that was not constant.  That is a 
      horrifically bad idea, unless you can prove that series will never produce
      any identical numbers that overlap when you mod them by hashsize, for all
      subranges from i to i+hashsize, for all i.  It's not worth investigating,
      since there was no clear benefit from using that hash function, and it was
      broken.

      For efficiency reasons, we've implemented this by storing h1 and h2 in a 
      temporary, and setting a variable called seed equal to h1.  We do a probe,
      and if we collided, we simply add h2 to seed each time through the loop.

      A good test for h2() is to subclass Hashtable, provide your own implementation
      of GetHash() that returns a constant, then add many items to the hash table.
      Make sure Count equals the number of items you inserted.

      Note that when we remove an item from the hash table, we set the key
      equal to buckets, if there was a collision in this bucket.  Otherwise
      we'd either wipe out the collision bit, or we'd still have an item in
      the hash table.

       -- 
    */

The HASHING algorithm is the algorithm used to determine the hash code of an item within the HashTable. HASHING算法是用于确定HashTable中项目的哈希码的算法。

The HASHTABLE algorithm (which I think is what this person is asking) is the algorithm the HashTable uses to organize its elements given their hash code. HASHTABLE算法(我认为是这个人所问的)是HashTable在给定哈希码时用来组织其元素的算法。

Java happens to use a chained hash table algorithm. Java碰巧使用链式哈希表算法。

Anything purporting to be a HashTable or something like it in .NET does not implement its own hashing algorithm: they always call the object-being-hashed's GetHashCode() method. 在.NET中任何声称是HashTable或类似东西的东西都没有实现它自己的散列算法:它们总是调用object-being-hashed的GetHashCode()方法。

There is a lot of confusion though as to what this method does or is supposed to do, especially when concerning user-defined or otherwise custom classes that override the base Object implementation . 关于此方法执行或应该执行的操作存在很多混淆,尤其是涉及覆盖基础Object实现的用户定义或其他自定义类时。

For .NET, you can use Reflector to see the various algorithms. 对于.NET,您可以使用Reflector查看各种算法。 There is a different one for the generic and non-generic hash table, plus of course each class defines its own hash code formula. 泛型和非泛型散列表有一个不同的,当然每个类定义自己的散列码公式。

The .NET Dictionary<T> class uses an IEqualityComparer<T> to compute hash codes for keys and to perform comparisons between keys in order to do hash lookups. .NET Dictionary<T>类使用IEqualityComparer<T>来计算密钥的哈希码,并执行密钥之间的比较以进行哈希查找。 If you don't provide an IEqualityComparer<T> when constructing the Dictionary<T> instance (it's an optional argument to the constructor) it will create a default one for you, which uses the object.GetHashCode and object.Equals methods by default. 如果你不提供IEqualityComparer<T>构造时Dictionary<T>实例(这是一个可选参数构造函数),它会创建一个默认了一个给你,它使用object.GetHashCodeobject.Equals默认方法。

As for how the standard GetHashCode implementation works, I'm not sure it's documented. 至于标准的GetHashCode实现如何工作,我不确定它是否有记录。 For specific types you can read the source code for the method in Reflector or try checking the Rotor source code to see if it's there. 对于特定类型,您可以在Reflector中读取方法的源代码,或尝试检查Rotor源代码以查看它是否存在。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM