简体繁体 English

std :: vector或std :: list for std :: unordered_map存储桶？

[英]std::vector or std::list for std::unordered_map buckets?

原文 2014-03-15 21:05:47 2 1 c++/ list/ vector/ unordered-map

What would the preferred data structure for when two keys map to the same bucket in an std::ordered_map<T> . 当两个键映射到std::ordered_map<T>的同一存储桶时，首选的数据结构是什么？ I am not sure whether it would be better to use a std::vector<T> (which would requiring copying all of the elements when reaching the capacity but quick iteration) or a std::list<T> which would be easy to add new elements but slower to iterate over? 我不确定使用std::vector<T> （达到容量时需要复制所有元素，但需要快速迭代）还是使用容易实现的std::list<T>更好？添加新元素，但迭代速度较慢？

std::vector 的std ::矢量

Quick for iterating over elements 快速遍历元素
Bad if run out of capacity as need to copy all elements in to new memory 如果由于容量不足而需要将所有元素复制到新内存中，则很糟糕

std::list 的std ::名单

Quick for adding/deleting nodes 快速添加/删除节点
Bad for iterating over as elements not in continuous memory 不利于迭代，因为元素不在连续内存中

1 个解决方案

Even without knowing your exact usage pattern and workload, I'd say that std::vector is better. 即使不知道您的确切使用模式和工作量，我也会说std::vector更好。 This is true most of the time, and I'll try and explain why below. 在大多数情况下都是如此，我将在下面解释原因。

Most of the time, you do a lot more lookups into a hash table than insertions. 在大多数情况下，对哈希表的查询要比插入查询多得多。 Lookups require iteration over a bucket, and insertions require adding elements and potentially resizing. 查找需要在存储桶上进行迭代，而插入则需要添加元素并可能需要调整大小。 Therefore, it makes more sense to optimize for the more common use case. 因此，针对更常见的用例进行优化更有意义。
Every insert internally needs to do a lookup, so you will have at least as many lookups as insert; 每个插入在内部都需要进行查找，因此您至少要进行与插入一样多的查找。 commonly much more. 通常更多。
Most of the time the average number of keys per bucket is low. 大多数情况下，每个存储桶的平均密钥数量很少。 This translates into working with a small vector vs. a small list. 转化为使用较小的矢量与较小的列表。 And resizing a small vector (which involves copying of elements) will be fast. 调整小向量的大小（涉及元素的复制）将很快。
That vector "resizing" generally doesn't happen very often, so you don't have to be irrationally afraid of it. 矢量“调整大小”通常不会经常发生，因此您不必非理性地害怕它。 (Although you should note that when the vector is small, the resizing does happen more often. This is true for all implementations I know, but it's also trivial to rectify/circumvent.) （尽管您应该注意，当向量很小时，调整大小的确会更频繁地发生。对于我所知道的所有实现都是如此，但纠正/规避也是不重要的。）
Iteration over a vector is a lot faster than iteration over a list. 迭代通过矢量比迭代一个列表快得多 。 A lot . 很多。
Even resizing and copying (or moving) a vector can be as fast (or almost as fast) as adding an element into a list. 甚至调整向量的大小和复制（或移动）向量都可以像将元素添加到列表一样快（或几乎一样快）。
With proper "move" support in your data types, resizing a vector becomes even less of an overhead. 通过在数据类型中提供适当的“移动”支持，调整向量大小的开销将变得更少。
You can preallocate some number of elements with a vector, and almost completely eliminate all resizing in buckets. 您可以使用向量预先分配一定数量的元素，几乎可以完全消除存储桶中的所有大小调整。 For example, if you know that 99% of your buckets will contain 3 keys or less, you can reserve 3 or 4 elements for each of the vectors and forget about resizing (almost.) 例如，如果您知道99％的存储桶将包含3个键或更少的键，则可以为每个向量保留3或4个元素，而不必重新设置大小（几乎）。
Vectors (specially when their element type is small) are much more space-efficient than linked lists. 向量（特别是当它们的元素类型较小时）比链表具有更高的空间效率。 A std::list needs to keep two additional pointers per element, which can be a huge overhead (50%-200% for elements of 8-16 bytes, depending on whether you are 32 or 64 bits.) 一个std::list每个元素需要保留两个额外的指针，这可能是巨大的开销（对于8-16字节的元素，其开销为50％-200％，具体取决于您是32位还是64位。）
Because of their smaller size and contiguous-ness in memory, vectors are generally a much faster and much nicer data structure. 由于向量的大小较小且在内存中连续，因此向量通常是更快，更好的数据结构。
Ultimately, you must do your own measurement and benchmarking within your own codebase and with your own workloads and usage patterns. 最终，您必须在自己的代码库中以及自己的工作负载和使用模式下进行自己的度量和基准测试。 No one can give you a definite answer without complete information of those. 没有完整的信息，没有人能给您一个明确的答案。 So, if your elements are very large, immovable objects, and you do mostly insertions/deletion and very few lookups, then go ahead and use linked lists. 因此，如果您的元素是非常大的，不可移动的对象，并且您主要执行插入/删除操作以及很少的查找，那么请继续使用链表。 Otherwise, use vectors. 否则，请使用向量。

You can take a look at this benchmark , comparing vector, list and deque. 您可以看一下这个基准，比较向量，列表和双端队列。 It might further help you decide to use vector! 它可能会进一步帮助您决定使用向量！