简体   繁体   English

HashSet的查找时间复杂度是多少 <T> (IEqualityComparer <T> )?

[英]What is the lookup time complexity of HashSet<T>(IEqualityComparer<T>)?

In C#.NET, I like using HashSets because of their supposed O(1) time complexity for lookups. 在C#.NET中,我喜欢使用HashSet,因为它们在查找时应该具有O(1)时间复杂度。 If I have a large set of data that is going to be queried, I often prefer using a HashSet to a List, since it has this time complexity. 如果我有大量要查询的数据,我通常更喜欢使用HashSet而不是List,因为它具有这种时间复杂性。

What confuses me is the constructor for the HashSet, which takes IEqualityComparer as an argument: 让我感到困惑的是HashSet的构造函数,它以IEqualityComparer作为参数:

http://msdn.microsoft.com/en-us/library/bb359100.aspx http://msdn.microsoft.com/en-us/library/bb359100.aspx

In the link above, the remarks note that the "constructor is an O(1) operation," but if this is the case, I am curious if lookup is still O(1). 在上面的链接中,备注指出“构造函数是O(1)操作”,但是如果是这种情况,我很好奇查询是否仍为O(1)。

In particular, it seems to me that, if I were to write a Comparer to pass in to the constructor of a HashSet, whenever I perform a lookup, the Comparer code would have to be executed on every key to check to see if there was a match. 特别是在我看来,如果我要编写一个Comparer传递给HashSet的构造函数,则每当我执行查找时,都必须对每个键执行Comparer代码以检查是否存在一场比赛。 This would not be O(1), but O(n). 这不是O(1),而是O(n)。

Does the implementation internally construct a lookup table as elements are added to the collection? 在将元素添加到集合中时,实现是否在内部构造查找表?

In general, how might I ascertain information about complexity of .NET data structures? 通常,我如何确定有关.NET数据结构复杂性的信息?

A HashSet works via hashing (via IEqualityComparer.GetHashCode ) the objects you insert and tosses the objects into buckets per the hash. HashSet通过对插入的对象进行哈希处理(通过IEqualityComparer.GetHashCode )进行工作,并根据哈希将对象放入存储桶中。 The buckets themselves are stored in an array, hence the O(1) part. 存储桶本身存储在数组中,因此存储在O(1)部分中。

For example (this is not necessarily exactly how the C# implementation works, it just gives a flavor) it takes the first character of the hash and throws everything with a hash starting with 1 into bucket 1. Hash of 2, bucket 2, and so on. 例如(它不一定是C#实现的确切工作方式,它只是提供了一种味道),它获取哈希的第一个字符,并将所有以1开头的哈希值都扔到存储桶1中。上。 Inside that bucket is another array of buckets that divvy up by the second character in the hash. 在该存储桶内是另一组存储桶,它们按哈希中的第二个字符划分。 So on for every character in the hash.... 因此,对于哈希中的每个字符...

Now, when you look something up, it hashes it, and jumps thru the appropriate buckets. 现在,当您查找某物时,它会对其进行哈希处理,并通过相应的存储桶跳转。 It has to do several array lookups (one for each character in the hash) but does not grow as a function of N, the number of objects you've added, hence the O(1) rating. 它必须执行几次数组查找(哈希中每个字符一个),但不会随N(所添加对象的数量)的函数而增长,因此为O(1)。

To your other question, here is a blog post with the complexity of a number of collections' operations: http://c-sharp-snippets.blogspot.com/2010/03/runtime-complexity-of-net-generic.html 另一个问题是,这是一个博客文章,涉及多个馆藏的操作的复杂性: http : //c-sharp-snippets.blogspot.com/2010/03/runtime-complexity-of-net-generic.html

if I were to write a Comparer to pass in to the constructor of a HashSet, whenever I perform a lookup, the Comparer code would have to be executed on every key to check to see if there was a match. 如果我要编写一个Comparer传递给HashSet的构造函数,则每当我执行查找时,都必须在每个键上执行Comparer代码以检查是否存在匹配项。 This would not be O(1), but O(n). 这不是O(1),而是O(n)。

Let's call the value you are searching for the "query" value. 让我们将您要搜索的值称为“查询”值。

Can you explain why you believe the comparer has to be executed on every key to see if it matches the query? 您能解释一下为什么您认为必须对每个键执行比较器以查看其是否与查询匹配吗?

This belief is false. 这种信念是错误的。 (Unless of course the hash code supplied by the comparer is the same for every key!) The search algorithm executes the equality comparer on every key whose hash code matches the query's hash code, modulo the number of buckets in the hash table. (当然,除非比较器提供的哈希码对于每个键都是相同的!)搜索算法将对每个哈希码与查询的哈希码匹配的键执行相等比较器,以哈希表中存储桶的数量为模。 That's how hash tables get O(1) lookup time. 这就是哈希表获得O(1)查找时间的方式。

Does the implementation internally construct a lookup table as elements are added to the collection? 在将元素添加到集合中时,实现是否在内部构造查找表?

Yes. 是。

In general, how might I ascertain information about complexity of .NET data structures? 通常,我如何确定有关.NET数据结构复杂性的信息?

Read the documentation. 阅读文档。

Actually the lookup time of a HashSet<T> isn't always O(1). 实际上, HashSet<T>的查找时间并不总是O(1)。

As others have already mentioned a HashSet uses IEqualityComparer<T>.GetHashCode() . 正如其他人已经提到的那样,HashSet使用IEqualityComparer<T>.GetHashCode()
Now consider a struct or object which always returns the same hash code x . 现在考虑始终返回相同哈希码x的结构或对象。

If you add n items to your HashSet there will be n items with the same hash in it (as long as the objects aren't equal). 如果将n个项添加到HashSet中,则将有n个项具有相同的哈希值(只要对象不相等)。
So if you were to check if an element with the hash code x exists in your HashSet it will run equality checks for all objects with the hash code x to test wether the HashSet contains the element 因此,如果您要检查HashSet中是否存在具有哈希码x的元素,它将对所有具有哈希码x对象进行相等性检查,以测试HashSet是否包含该元素

It would depends on quality of hash function ( GetHashCode() ) your IEqualityComparer implementation provides. 这取决于IEqualityComparer实现提供的哈希函数( GetHashCode() )的质量。 Ideal hash function should provide well-distributed random set of hash codes. 理想的哈希函数应提供分布良好的随机哈希码集。 These hash codes will be used as an index which allows mapping key to a value, so search for a value by key becomes more efficient especially when a key is a complex object/structure. 这些哈希码将用作允许将键映射到值的索引,因此按键搜索值变得更加高效,尤其是当键是复杂的对象/结构时。

the Comparer code would have to be executed on every key to check to see if there was a match. 比较器代码将必须在每个键上执行,以检查是否存在匹配项。 This would not be O(1), but O(n). 这不是O(1),而是O(n)。

This is not how hashtable works, this is some kind of straightforward bruteforce search. 哈希表不是这样工作的,这是某种直接的蛮力搜索。 In case of hashtable you would have more intelligent approach which uses search by index (hash code). 对于哈希表,您将拥有更智能的方法,该方法使用按索引搜索(哈希代码)。

Lookup is still O(1) if you pass an IEqualityComparer. 如果您传递IEqualityComparer,查找仍为O(1)。 The hash set still uses the same logic as if you don't pass an IEqualityComparer; 哈希集仍然使用与传递IEqualityComparer相同的逻辑。 it just uses the IEqualityComparer's implementations of GetHashCode and Equals instead of the instance methods of System.Object (or the overrides provided by the object in question). 它仅使用IEqualityComparer的GetHashCode和Equals的实现,而不是System.Object的实例方法(或有问题的对象提供的替代)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM