简体繁体 English

更喜欢 unordered_set 而不是向量

[英]Prefer unordered_set over vector

原文 2015-11-23 16:05:19 8 5 c++/ stl

是否可以肯定地说，如果我不想在容器中重复，并且我不关心元素位置，因为我只想遍历容器，那么我应该使用unordered_set而不是vector ？

5 个解决方案

Is it safe to say that if I don't want duplicates in my container, and I don't care about element position as I only want to iterate through the container, then I should use an unordered_set instead of vector?可以肯定地说，如果我不想在我的容器中重复，并且我不关心元素位置，因为我只想遍历容器，那么我应该使用 unordered_set 而不是向量？

No, it is not.不它不是。 It depends on many factors.这取决于许多因素。 For example if you seldom add new elements but iterate over container quite often it would be preferable to use std::vector and maintain uniqueness manually.例如，如果您很少添加新元素但经常迭代容器，则最好使用std::vector并手动保持唯一性。 There also could be other factors affecting your decision.也可能有其他因素影响您的决定。 But normally yes you may prefer std::unordered_set as it simplifies your program.但通常是的，您可能更喜欢std::unordered_set因为它简化了您的程序。

Not entirely.不是完全。 unordered_set s are not required to be contiguous containers; unordered_set不需要是连续的容器； in the case where you'd frequently want to read ~~all~~ numerous values contained in the set, you may prefer std::vector on time-critic application.如果您经常想要读取集合中包含的所有大量值，您可能更喜欢std::vector在时间批评应用程序中。

std::unordered_set : std::unordered_set ：

Internally, the elements are not sorted in any particular order, but organized into buckets.在内部，元素没有按任何特定顺序排序，而是组织成桶。 Which bucket an element is placed into depends entirely on the hash of its value.将元素放入哪个桶完全取决于其值的哈希值。 This allows fast access to individual elements, since once a hash is computed, it refers to the exact bucket the element is placed into.这允许快速访问单个元素，因为一旦计算了哈希值，它就会引用元素所在的确切存储桶。

But in the general case, I'd say Yes.但在一般情况下，我会说是。

I generally prefer vector or map.我通常更喜欢矢量或地图。 (or in your case, std::set). （或者在你的情况下，std::set）。

Hash tables can be faster than maps/sets (red-black trees), but red-black trees have guaranteed performance 100% of the time.哈希表可以比映射/集合（红黑树）更快，但红黑树在 100% 的时间内保证了性能。 And logarithmic performance is REALLY fast!对数性能真的很快！ A hash table kan kill performance when it starts rehashing.哈希表在开始重新散列时会扼杀性能。

std::vector is the workhorse of the STL and should be your default choice. std::vector 是 STL 的主力，应该是您的默认选择。 Vector is very straightforward, and is very cache-friendly Vector 非常简单，并且对缓存非常友好

This article by Matt Austern is related to this topic and it is worth reading: Matt Austern 的这篇文章与这个话题有关，值得一读：

Why you shouldn't use set (and what you should use instead) by Matt Austern为什么你不应该使用 set （以及你应该使用什么） by Matt Austern

This thread is trying to identify conditions under which unordered_set is preferable over vectors.该线程试图确定 unordered_set 优于向量的条件。 Similarly, in the above article, the author clearly identifies four conditions, which all need to be satisfied in order to prefer set over a custom but simpler data structure called sorted_vector (last section: What is set good for?).同样，在上面的文章中，作者清楚地确定了四个条件，为了更喜欢 set 而不是一个自定义但更简单的称为 sorted_vector 的数据结构（最后一节：set 有什么用？），需要满足所有这些条件。 It will be interesting to clearly state a set of conditions for preferring unordered_set over vector.明确说明优先使用 unordered_set 而不是向量的一组条件会很有趣。

also, the last paragraph of the article summarizes a useful rule to keep in mind:此外，文章的最后一段总结了一条需要牢记的有用规则：

Every component in the standard C++ library is there because it's useful for some purpose, but sometimes that purpose is narrowly defined and rare.标准 C++ 库中的每个组件都在那里，因为它对某些用途很有用，但有时这种用途的定义很窄且很少见。 As a general rule you should always use the simplest data structure that meets your needs.作为一般规则，您应该始终使用满足您需求的最简单的数据结构。 The more complicated a data structure, the more likely that it's not as widely useful as it might seem.数据结构越复杂，它就越有可能不像看起来那么有用。

Of course yes.当然是的。 If you do not want duplicates, you have to use a key-aware container, and since unordered_* totally win over their tree-based counterparts, this is pretty much your only choice.如果你不想要重复，你必须使用一个 key-aware 容器，因为 unordered_* 完全战胜了基于树的对应物，这几乎是你唯一的选择。