简体繁体 English

迭代HashSet的最快/最安全的方法是什么？

[英]What is the fastest/safest method to iterate over a HashSet?

原文 2012-03-08 21:33:01 5 4 c#/ performance/ iteration/ hashset

I'm still quite new to C#, but noticed the advantages through forum postings of using a HashSet instead of a List in specific cases. 我还是C＃的新手，但是在特定情况下通过论坛发帖使用HashSet而不是List来注意到这些优势。

My current case isn't that I'm storing a tremendous amount of data in a single List exectly, but rather than I'm having to check for members of it often. 我目前的情况并不是说我在一个List存储了大量的数据，而是我不得不经常检查它的成员。

The catch is that I do indeed need to iterate over it as well, but the order they are stored or retrieved doesn't actually matter. 问题在于我确实需要迭代它，但它们存储或检索的顺序实际上并不重要。

I've read that for each loops are actually slower than for next, so how else could I go about this in the fastest method possible? 我已经读过，因为每个循环实际上比下一个循环慢，所以我怎么能用尽可能快的方法来解决这个问题呢？

The number of .Contains() checks I'm doing is definitely hurting my performance with lists, so at least comparing to the performance of a HashSet would be handy. 我正在做的.Contains()检查的数量肯定会损害我的列表性能，所以至少与HashSet的性能相比会很方便。

Edit: I'm currently using lists, iterating through them in numerous locations, and different code is being executed in each location. 编辑：我目前正在使用列表，在多个位置迭代它们，并且在每个位置执行不同的代码。 Most often, the current lists contain point coordinates that I then use to refer to a 2 dimensional array for that I then do some operation or another based on the criteria of the list. 大多数情况下，当前列表包含点坐标，然后我将其用于引用二维数组，然后根据列表的条件执行某些操作或其他操作。

If there's not a direct answer to my question, that's fine, but I assumed there might be other methods of iterating over a HashSet than just foreach cycle. 如果没有直接回答我的问题，那很好，但我假设可能有其他迭代HashSet而不仅仅是foreach循环。 I'm currently in the dark as to what other methods there might even be, what advantages they provide, etc. Assuming there are other methods, I also made the assumption that there would be a typical preferred method of choice that is only ignored when it doesn't suite the needs (my needs are pretty basic). 我目前处于黑暗状态，甚至可能有其他方法，它们提供了哪些优势等等。假设还有其他方法，我还假设有一种典型的首选方法，只有在它不能满足需求（我的需求非常基本）。

As far as prematurely optimizing, I already know using the lists as I am is a bottleneck. 至于过早优化，我已经知道使用列表，因为我是一个瓶颈。 How to go about helping this issue is where I'm getting stuck. 如何解决这个问题是我陷入困境的地方。 Not even stuck exactly, but I didn't want to re-invent the wheel by testing repeatedly only to find out I'm already doing it the best way I could (this is a large project with over 3 months invested, lists are everywhere, but there are definitely ones that I do not want duplicates, have a lot of data, need not be stored in any specific order, etc). 甚至没有完全卡住，但我不想通过重复测试重新发明轮子只是为了发现我已经尽力而为（这是一个投资超过3个月的大型项目，列表无处不在，但肯定有一些我不想重复，有大量数据，不需要以任何特定顺序存储，等等。

4 个解决方案

A foreach loop has a small amount of addition overhead on an indexed collections (like an array). foreach循环在索引集合（如数组）上有少量的额外开销。 This is mostly because the foreach does a little more bounds checking than a for loop. 这主要是因为foreach比for循环更多地进行边界检查。

HashSet does not have an indexer so you have to use the enumerator. HashSet没有索引器，因此您必须使用枚举器。

In this case foreach is efficient as it only calls MoveNext() as it moves through the collection. 在这种情况下，foreach是高效的，因为它只在移动集合时调用MoveNext（）。

Also Parallel.ForEach can dramatically improve your performance, depending on the work you are doing in the loop and the size of your HashSet. 此外，Parallel.ForEach可以显着提高您的性能，具体取决于您在循环中所做的工作以及HashSet的大小。

As mentioned before profiling is your best bet. 如前所述，分析是您最好的选择。

You shouldn't be iterating over a hashset in the first place to determine if an item is in it. 您不应该首先迭代一个哈希集来确定项目是否在其中。 You should use the HashSet (not the LINQ) contains method. 您应该使用HashSet（而不是LINQ）contains方法。 The HashSet is designed such that it won't need to look through every item to see if any given value is inside of the set. HashSet的设计使得它不需要查看每个项目以查看任何给定值是否在集合内部。 That is what makes it so powerful for searching over a List. 这就是它在搜索List方面如此强大的原因。

Not strictly answering the question in the header, but more concerning your specific problem: 不严格回答标题中的问题，但更多地涉及您的具体问题：

I would make your own Collection object that uses both a HashSet and a List internally. 我会创建自己的Collection对象，在内部同时使用HashSet和List 。 Iterating is fast as you can use the List, checking for Contains is fast as you can use the HashSet. 迭代很快，因为您可以使用List，检查Contains是否很快，因为您可以使用HashSet。 Just make it an IEnumerable and you can use this Collection in foreach as well. 只需将其设为IEnumerable ，您也可以在foreach使用此Collection。

The downside is more memory, but there are only twice as many references to object, not twice as many objects. 缺点是更多的内存，但对象的引用只有两倍，而不是对象的两倍。 Worst case scenario it's only twice as much memory, but you seem much more concerned with performance. 最糟糕的情况是内存只有两倍，但你似乎更关心性能。

Adding, checking, and iterating are fast this way, only removal is still O(N) because of the List . 通过这种方式添加，检查和迭代很快，由于List ，只有删除仍然是O（N）。

EDIT: If removal needs to be O(1) as well, use a doubly linked list instead of a regular list, and make the hashSet a Dictionary<KeyType, Cell> instead. 编辑：如果删除也需要是O（1），使用双向链表而不是常规列表，并使hashSet成为Dictionary<KeyType, Cell> 。 You can check the dictionary for Contains, but also to find the cell with the data in it fast, so removal from the data structure is fast. 您可以检查包含字典的字典，还可以快速查找包含数据的单元格，因此从数据结构中删除速度很快。

I had the same issue, where the HashSet suits very well the addition of unique elements, but is very slow when getting elements in a for loop. 我有同样的问题，HashSet非常适合添加独特元素，但在for循环中获取元素时速度非常慢。 I solved it by converting the HashSet to array and then running the for over it. 我通过将HashSet转换为数组然后运行for来解决它。