简体繁体 English

std :: set和std :: priority_queue之间的区别

[英]Difference between std::set and std::priority_queue

原文 2012-04-13 13:33:22 6 4 c++/ algorithm/ sorting/ priority-queue

Since both std::priority_queue and std::set (and std::multiset ) are data containers that store elements and allow you to access them in an ordered fashion, and have same insertion complexity O(log n) , what are the advantages of using one over the other (or, what kind of situations call for the one or the other?)? 由于std::priority_queue和std::set （以及std::multiset ）都是存储元素的数据容器，并允许您以有序的方式访问它们，并且具有相同的插入复杂度O(log n) ，有什么优势使用一个而不是另一个（或者，什么样的情况需要一个或另一个？）？

While I know that the underlying structures are different, I am not as much interested in the difference in their implementation as I am in the comparison their performance and suitability for various uses. 虽然我知道底层结构是不同的，但我对它们的实现差异并不那么感兴趣，因为我在比较它们的性能和适用于各种用途。

Note: I know about the no-duplicates in a set. 注意：我知道集合中没有重复。 That's why I also mentioned std::multiset since it has the exactly same behavior as the std::set but can be used where the data stored is allowed to compare as equal elements. 这就是为什么我也提到了std::multiset因为它具有与std::set完全相同的行为，但是可以在允许存储的数据作为相等元素进行比较的情况下使用。 So please, don't comment on single/multiple keys issue. 所以，请不要评论单/多键问题。

4 个解决方案

A priority queue only gives you access to one element in sorted order -- ie, you can get the highest priority item, and when you remove that, you can get the next highest priority, and so on. 优先级队列只允许您按排序顺序访问一个元素 - 即，您可以获得最高优先级的项目，当您删除它时，您可以获得下一个最高优先级，依此类推。 A priority queue also allows duplicate elements, so it's more like a multiset than a set. 优先级队列也允许重复元素，因此它更像是多集而不是集。 [Edit: As @Tadeusz Kopec pointed out, building a heap is also linear on the number of items in the heap, where building a set is O(N log N) unless it's being built from a sequence that's already ordered (in which case it is also linear).] [编辑：正如@Tadeusz Kopec指出的那样，构建堆也是堆中项目数量的线性，其中构建集合是O（N log N），除非它是从已经订购的序列构建的（在这种情况下）它也是线性的。]

A set allows you full access in sorted order, so you can, for example, find two elements somewhere in the middle of the set, then traverse in order from one to the other. 集合允许您按排序顺序进行完全访问，因此您可以在集合中间的某处找到两个元素，然后按顺序遍历从一个到另一个。

std::priority_queue allows to do the following: std::priority_queue允许执行以下操作：

Insert an element O(log n) 插入元素O(log n)
Get the smallest element O(1) 获得最小元素O(1)
Erase the smallest element O(log n) 擦除最小元素O(log n)

while std::set has more possibilities: 而std::set有更多可能性：

Insert any element O(log n) and the constant is greater than in std::priority_queue 插入任何元素O(log n) ，常量大于std::priority_queue
Find any element O(log n) 找到任何元素O(log n)
Find an element, >= than the one your are looking for O(log n) ( lower_bound ) 找到一个元素，> =而不是你正在寻找的元素O(log n) （ lower_bound ）
Erase any element O(log n) 擦除任何元素O(log n)
Move to previous/next element in sorted order O(1) 按排序顺序移动到上一个/下一个元素O(1)
Get the smallest element O(1) 获得最小元素O(1)
Get the largest element O(1) 获得最大元素O(1)

set/multiset are generally backed by a binary tree. set / multiset通常由二叉树支持。 http://en.wikipedia.org/wiki/Binary_tree http://en.wikipedia.org/wiki/Binary_tree

priority_queue is generally backed by a heap. priority_queue通常由堆支持。 http://en.wikipedia.org/wiki/Heap_(data_structure) http://en.wikipedia.org/wiki/Heap_(data_structure）

So the question is really when should you use a binary tree instead of a heap? 所以问题是你何时应该使用二叉树而不是堆？

Both structures are laid out in a tree, however the rules about the relationship between anscestors are different. 两种结构都布置在树中，但是关于祖先之间关系的规则是不同的。

We will call the positions P for parent, L for left child, and R for right child. 我们将父母的职位P称为左子，L称为右子的职位。

In a binary tree L < P < R. 在二叉树中L <P <R

In a heap P < L and P < R 在堆P <L和P <R

So binary trees sort "sideways" and heaps sort "upwards". 所以二进制树排序“横向”，堆排序“向上”。

So if we look at this as a triangle than in the binary tree L,P,R are completely sorted, whereas in the heap the relationship between L and R is unknown (only their relationship to P). 因此，如果我们将其视为三角形而不是二叉树L，P，R是完全排序的，而在堆中L和R之间的关系是未知的（只有它们与P的关系）。

This has the following effects: 这具有以下效果：

If you have an unsorted array and want to turn it into a binary tree it takes O(nlogn) time. 如果您有一个未排序的数组并想将其转换为二叉树，则需要O(nlogn)时间。 If you want to turn it into a heap it only takes O(n) time, (as it just compares to find the extreme element) 如果你想把它变成堆，它只需要O(n)时间，（因为它只是比较找到极端元素）
Heaps are more efficient if you only need the extreme element (lowest or highest by some comparison function). 如果您只需要极端元素（某些比较函数的最低或最高），则堆效率更高。 Heaps only do the comparisons (lazily) necessary to determine the extreme element. 堆只做必要的比较（懒惰）来确定极端元素。
Binary trees perform the comparisons necessary to order the entire collection, and keep the entire collection sorted all-the-time. 二叉树执行订购整个集合所需的比较，并始终对整个集合进行排序。
Heaps have constant-time lookup (peek) of lowest element, binary trees have logarithmic time lookup of lowest element. 堆具有最低元素的恒定时间查找（查看），二进制树具有最低元素的对数时间查找。

Since both std::priority_queue and std::set (and std::multiset ) are data containers that store elements and allow you to access them in an ordered fashion, and have same insertion complexity O(log n) , what are the advantages of using one over the other (or, what kind of situations call for the one or the other?)? 由于std::priority_queue和std::set （以及std::multiset ）都是存储元素的数据容器，并允许您以有序的方式访问它们，并且具有相同的插入复杂度O(log n) ，有什么优势使用一个而不是另一个（或者，什么样的情况需要一个或另一个？）？

Even though insert and erase operations for both containers have the same complexity O(log n) , these operations for std::set are slower than for std::priority_queue . 尽管两个容器的插入和擦除操作具有相同的复杂度O（log n） ，但std::set这些操作比std::priority_queue慢。 That's because std::set makes many memory allocations. 那是因为std::set会产生很多内存分配。 Every element of std::set is stored at its own allocation. std::set每个元素都存储在自己的分配中。 std::priority_queue (with underlying std::vector container by default) uses single allocation to store all elements. std::priority_queue （默认使用底层的std::vector容器）使用单个分配来存储所有元素。 On other hand std::priority_queue uses many swap operations on its elements whereas std::set uses just pointers swapping. 另一方面， std::priority_queue对其元素使用许多交换操作，而std::set仅使用指针交换。 So if swapping is very slow operation for element type, using std::set may be more efficient. 因此，如果交换是元素类型的非常慢的操作，使用std::set可能会更有效。 Moreover element may be non-swappable at all. 此外，元素可以是不可交换的。

Memory overhead for std::set is much bigger also because it has to store many pointers between its nodes. std::set内存开销要大得多，因为它必须在其节点之间存储许多指针。