简体繁体 English

定义：什么是HashSet？

[英]Define: What is a HashSet?

原文 2010-12-29 23:24:25 2 4 c#/ hashset

HashSet The C# HashSet data structure was introduced in the .NET Framework 3.5. HashSet C# HashSet 数据结构是在 .NET Framework 3.5 中引入的。 A full list of the implemented members can be found at the HashSet MSDN page.已实现成员的完整列表可以在HashSet MSDN页面上找到。

Where is it used?它在哪里使用？
Why would you want to use it?你为什么要使用它？

4 个解决方案

1. A HashSet holds a set of objects, but in a way that allows you to easily and quickly determine whether an object is already in the set or not. HashSet包含一组对象，但以一种允许您轻松快速地确定对象是否已经在集合中的方式。 It does so by internally managing an array and storing the object using an index which is calculated from the hashcode of the object.它通过在内部管理一个数组并使用从对象的哈希码计算的索引来存储对象来实现这一点。 Take a look here 看看这里
HashSet is an unordered collection containing unique elements. HashSet是一个包含唯一元素的无序集合。 It has the standard collection operations Add, Remove, Contains, but since it uses a hash-based implementation, these operations are O(1).它具有标准的集合操作 Add、Remove、Contains，但由于它使用基于散列的实现，因此这些操作是 O(1)。 (As opposed to List for example, which is O(n) for Contains and Remove.) HashSet also provides standard set operations such as union , intersection , and symmetric difference . （例如，与 List 不同，包含和删除的 O(n)。） HashSet还提供标准集合操作，例如union 、 intersection和symmetric difference 。 Take a look here 看看这里
There are different implementations of Sets. Sets有不同的实现。 Some make insertion and lookup operations super fast by hashing elements.有些通过散列元素使插入和查找操作超快。 However, that means that the order in which the elements were added is lost.但是，这意味着添加元素的顺序会丢失。 Other implementations preserve the added order at the cost of slower running times.其他实现以较慢的运行时间为代价保留了添加的顺序。

The HashSet class in C# goes for the first approach, thus not preserving the order of elements. C# 中的HashSet类采用第一种方法，因此不保留元素的顺序。 It is much faster than a regular List .它比普通的List快得多。 Some basic benchmarks showed that HashSet is decently faster when dealing with primary types (int, double, bool, etc.).一些基本的基准测试表明，HashSet 在处理主要类型（int、double、bool 等）时要快得多。 It is a lot faster when working with class objects.使用类对象时速度要快得多。 So the point is that HashSet is fast.所以重点是 HashSet 很快。

The only catch of HashSet is that there is no access by indices. HashSet的唯一问题是无法通过索引访问。 To access elements you can either use an enumerator or use the built-in function to convert the HashSet into a List and iterate through that.要访问元素，您可以使用枚举器或使用内置函数将HashSet转换为List并对其进行迭代。 Take a look here 看看这里

A HashSet has an internal structure (hash), where items can be searched and identified quickly. HashSet具有内部结构（散列），可以在其中快速搜索和识别项目。 The downside is that iterating through a HashSet (or getting an item by index) is rather slow.缺点是遍历HashSet （或按索引获取项目）相当慢。

So why would someone want be able to know if an entry already exists in a set?那么为什么有人想要知道一个条目是否已经存在于集合中呢？

One situation where a HashSet is useful is in getting distinct values from a list where duplicates may exist. HashSet有用的一种情况是从可能存在重复的列表中获取不同的值。 Once an item is added to the HashSet it is quick to determine if the item exists ( Contains operator).将项目添加到HashSet后，可以快速确定该项目是否存在（ Contains运算符）。

Other advantages of the HashSet are the Set operations: IntersectWith , IsSubsetOf , IsSupersetOf , Overlaps , SymmetricExceptWith , UnionWith . HashSet的其他优点是集合操作： IntersectWith 、 IsSubsetOf 、 IsSupersetOf 、 Overlaps 、 SymmetricExceptWith 、 UnionWith 。

If you are familiar with the object constraint language then you will identify these set operations.如果您熟悉对象约束语言，那么您将识别这些集合操作。 You will also see that it is one step closer to an implementation of executable UML.您还将看到它离可执行 UML 的实现更近了一步。

Simply said and without revealing the kitchen secrets: a set in general, is a collection that contains no duplicate elements, and whose elements are in no particular order.简单地说，并没有透露厨房的秘密：一般来说，集合是一个不包含重复元素的集合，其元素没有特定的顺序。 So, A HashSet<T> is similar to a generic List<T> , but is optimized for fast lookups (via hashtables, as the name implies) at the cost of losing order.因此，A HashSet<T>类似于通用List<T> ，但针对快速查找（通过哈希表，顾名思义）进行了优化，但以丢失顺序为代价。

From application perspective, if one needs only to avoid duplicates then HashSet is what you are looking for since it's Lookup, Insert and Remove complexities are O(1) - constant .从应用程序的角度来看，如果只需要避免重复，那么HashSet就是您要寻找的，因为它的查找、插入和删除复杂性是 O(1) - 常数。 What this means it does not matter how many elements HashSet has it will take same amount of time to check if there's such element or not, plus since you are inserting elements at O(1) too it makes it perfect for this sort of thing.这意味着HashSet有多少元素并不重要，它会花费相同的时间来检查是否存在这样的元素，而且由于您也在 O(1) 处插入元素，因此它非常适合这类事情。