简体   繁体   English

使用向量类实现堆栈的链表与动态数组

[英]Linked list vs dynamic array for implementing a stack using vector class

I was reading up on the two different ways of implementing a stack: linked list and dynamic arrays.我正在阅读实现堆栈的两种不同方法:链表和动态数组。 The main advantage of a linked list over a dynamic array was that the linked list did not have to be resized while a dynamic array had to be resized if too many elements were inserted hence wasting alot of time and memory.链表相对于动态数组的主要优点是链表不必调整大小,而如果插入的元素过多,则必须调整动态数组的大小,从而浪费大量时间和内存。

That got me wondering if this is true for C++ (as there is a vector class which automatically resizes whenever new elements are inserted)?这让我想知道这是否适用于 C++(因为有一个向量类,它会在插入新元素时自动调整大小)?

It's difficult to compare the two, because the patterns of their memory usage are quite different.很难比较两者,因为它们的内存使用模式非常不同。

Vector resizing矢量调整大小

A vector resizes itself dynamically as needed.矢量根据需要动态调整自身大小。 It does that by allocating a new chunk of memory, moving (or copying) data from the old chunk to the new chunk, the releasing the old one.它通过分配一个新的内存块,将数据从旧块移动(或复制)到新块,释放旧块来实现。 In a typical case, the new chunk is 1.5x the size of the old (contrary to popular belief, 2x seems to be quite unusual in practice).在典型的情况下,新块的大小是旧块的 1.5 倍(与流行的看法相反,2 倍在实践中似乎很不寻常)。 That means for a short time while reallocating, it needs memory equal to roughly 2.5x as much as the data you're actually storing.这意味着在重新分配的短时间内,它需要的内存大约是您实际存储的数据的 2.5 倍。 The rest of the time, the "chunk" that's in use is a minimum of 2/3 rds full, and a maximum of completely full.其余时间,正在使用的“块”最少为 2/3 rds满,最多为完全满。 If all sizes are equally likely, we can expect it to average about 5/6 ths full.如果所有尺寸的可能性均等,我们可以预期它的平均填充率约为 5/6。 Looking at it from the other direction, we can expect about 1/6 th , or about 17% of the space to be "wasted" at any given time.从另一个角度看,我们可以预计大约1/6,或约17的空间%被“浪费”在任何给定的时间。

When we do resize by a constant factor like that (rather than, for example, always adding a specific size of chunk, such as growing in 4Kb increments) we get what's called amortized constant time addition.当我们按照这样的常数因子调整大小时(而不是,例如,总是添加特定大小的块,例如以 4Kb 的增量增长),我们得到了所谓的摊销常数时间添加。 In other words, as the array grows, resizing happens exponentially less often.换句话说,随着数组的增长,调整大小的频率呈指数下降。 The average number of times items in the array have been copied tends to a constant (usually around 3, but depends on the growth factor you use).数组中项目被复制的平均次数趋于恒定(通常约为 3,但取决于您使用的增长因子)。

linked list allocations链表分配

Using a linked list, the situation is rather different.使用链表,情况就大不相同了。 We never see resizing, so we don't see extra time or memory usage for some insertions.我们从未看到调整大小,因此我们看不到某些插入的额外时间或内存使用。 At the same time, we do see extra time and memory used essentially all the time.与此同时,我们确实看到额外的时间和内存基本上一直在使用。 In particular, each node in the linked list needs to contain a pointer to the next node.特别是,链表中的每个节点都需要包含一个指向下一个节点的指针。 Depending on the size of the data in the node compared to the size of a pointer, this can lead to significant overhead.根据节点中数据的大小与指针的大小相比,这可能会导致显着的开销。 For example, let's assume you need a stack of int s.例如,假设您需要一堆int In a typical case where an int is the same size as a pointer, that's going to mean 50% overhead -- all the time.int与指针大小相同的典型情况下,这将意味着 50% 的开销——一直是。 It's increasingly common for a pointer to be larger than an int ;指针大于int的情况越来越常见; twice the size is fairly common (64-bit pointer, 32-bit int).两倍的大小相当常见(64 位指针,32 位 int)。 In such a case, you have ~67% overhead -- ie, obviously enough, each node devoting twice as much space to the pointer as the data being stored.在这种情况下,您有大约 67% 的开销——即,很明显,每个节点为指针提供的空间是存储数据的两倍。

Unfortunately, that's often just the tip of the iceberg.不幸的是,这通常只是冰山一角。 In a typical linked list, each node is dynamically allocated individually.在典型的链表中,每个节点都是单独动态分配的。 At least if you're storing small data items (such as int ) the memory allocated for a node may be (usually will be) even larger than the amount you actually request.至少,如果您要存储小数据项(例如int ),则为节点分配的内存可能(通常会)甚至大于您实际请求的数量。 So -- you ask for 12 bytes of memory to hold an int and a pointer -- but the chunk of memory you get is likely to be rounded up to 16 or 32 bytes instead.所以——你要求 12 字节的内存来保存一个 int 和一个指针——但是你得到的内存块很可能会被四舍五入到 16 或 32 字节。 Now you're looking at overhead of at least 75% and quite possibly ~88%.现在您看到的开销至少为 75%,很可能约为 88%。

As far as speed goes, the situation is rather similar: allocating and freeing memory dynamically is often quite slow.就速度而言,情况相当相似:动态分配和释放内存通常很慢。 The heap manager typically has blocks of free memory, and has to spend time searching through them to find the block that's most suited to the size you're asking for.堆管理器通常具有空闲内存块,并且必须花时间搜索它们以找到最适合您要求的大小的块。 Then it (typically) has to split that block into two pieces, one to satisfy your allocation, and another of the remaining memory it can use to satisfy other allocations.然后它(通常)必须将该块分成两部分,一个用于满足您的分配,另一个用于满足其他分配。 Likewise, when you free memory, it typically goes back to that same list of free blocks and checks whether there's an adjoining block of memory already free, so it can join the two back together.同样,当您释放内存时,它通常会返回到相同的空闲块列表并检查是否有相邻的内存块已经空闲,因此它可以将两者重新连接在一起。

Allocating and managing lots of blocks of memory is expensive.分配和管理大量内存块的成本很高。

cache usage缓存使用

Finally, with recent processors we run into another important factor: cache usage.最后,对于最近的处理器,我们遇到了另一个重要因素:缓存使用。 In the case of a vector, we have all the data right next to each other.在向量的情况下,我们拥有彼此相邻的所有数据。 Then, after the end of the part of the vector that's in use, we have some empty memory.然后,在使用的向量部分结束后,我们有一些空内存。 This leads to excellent cache usage -- the data we're using gets cached;这导致了出色的缓存使用——我们使用的数据被缓存; the data we're not using has little or no effect on the cache at all.我们没有使用的数据对缓存几乎没有影响。

With a linked list, the pointers (and probable overhead in each node) are distributed throughout our list.使用链表,指针(以及每个节点中可能的开销)分布在整个链表中。 Ie, each piece of data we care about has, right next to it, the overhead of the pointer, and the empty space allocated to the node that we're not using.即,我们关心的每条数据旁边都有指针的开销,以及分配给我们没有使用的节点的空白空间。 In short, the effective size of the cache is reduced by about the same factor as the overall overhead of each node in the list -- ie, we might easily see only 1/8 th of the cache storing the date we care about, and 7/8 ths devoted to storing pointers and/or pure garbage.总之,高速缓存的有效尺寸降低了大约为列表中的每个节点的总开销相同的因素-也就是说,我们可能很容易看到的只有1/8缓存的存储我们关心的日期, 7/8 ths专门用于存储指针和/或纯垃圾。

Summary概括

A linked list can work well when you have a relatively small number of nodes, each of which is individually quite large.当您的节点数量相对较少时,链表可以很好地工作,每个节点都非常大。 If (as is more typical for a stack) you're dealing with a relatively large number of items, each of which is individually quite small, you're much less likely to see a savings in time or memory usage.如果(这是一个堆栈更典型的),你所面对的是相对大量的项目,每一个都是单独相当小,你就不太可能的看到在时间或内存使用储蓄。 Quite the contrary, for such cases, a linked list is much more likely to basically waste a great deal of both time and memory.恰恰相反,对于这种情况,链表更有可能基本上浪费大量时间和内存。

Yes, what you say is true for C++.是的,您所说的对于 C++ 来说是正确的。 For this reason, the default container inside std::stack , which is the standard stack class in C++, is neither a vector nor a linked list, but a double ended queue (a deque ).为此, std::stack的默认容器(C++ 中的标准堆栈类)既不是向量也不是链表,而是双端队列( deque )。 This has nearly all the advantages of a vector, but it resizes much better.这几乎具有矢量的所有优点,但它调整大小要好得多。

Basically, an std::deque is a linked list of arrays of sorts internally.基本上, std::deque是内部排序数组链表 This way, when it needs to resize, it just adds another array.这样,当它需要调整大小时,它只会添加另一个数组。

First, the performance trade-offs between linked-lists and dynamic arrays are a lot more subtle than that.首先,链表和动态数组之间的性能权衡比这要微妙得多。

The vector class in C++ is, by requirement, implemented as a "dynamic array", meaning that it must have an amortized-constant cost for inserting elements into it.根据要求,C++ 中的向量类实现为“动态数组”,这意味着它必须具有向其中插入元素的摊销常数成本。 How this is done is usually by increasing the "capacity" of the array in a geometric manner, that is, you double the capacity whenever you run out (or come close to running out).如何做到这一点通常是通过以几何方式增加阵列的“容量”,也就是说,每当您用完(或接近用完)时,您就将容量加倍。 In the end, this means that a reallocation operation (allocating a new chunk of memory and copying the current content into it) is only going to happen on a few occasions.最后,这意味着重新分配操作(分配新的内存块并将当前内容复制到其中)只会在少数情况下发生。 In practice, this means that the overhead for the reallocations only shows up on performance graphs as little spikes at logarithmic intervals.实际上,这意味着重新分配的开销仅在性能图上显示为对数间隔的小峰值。 This is what it means to have "amortized-constant" cost, because once you neglect those little spikes, the cost of the insert operations is essentially constant (and trivial, in this case).这就是具有“摊销不变”成本的含义,因为一旦您忽略了那些小尖峰,插入操作的成本基本上是不变的(在这种情况下是微不足道的)。

In a linked-list implementation, you don't have the overhead of reallocations, however, you do have the overhead of allocating each new element on freestore (dynamic memory).在链表实现中,您没有重新分配的开销,但是,您确实有在 freestore(动态内存)上分配每个新元素的开销。 So, the overhead is a bit more regular (not spiked, which can be needed sometimes), but could be more significant than using a dynamic array, especially if the elements are rather inexpensive to copy (small in size, and simple object).因此,开销有点规律(不是尖峰,有时可能需要),但可能比使用动态数组更重要,特别是如果元素的复制成本相当低(尺寸小,对象简单)。 In my opinion, linked-lists are only recommended for objects that are really expensive to copy (or move).在我看来,链表只推荐用于复制(或移动)成本非常高的对象。 But at the end of the day, this is something you need to test in any given situation.但归根结底,这是您需要在任何给定情况下进行测试的内容。

Finally, it is important to point out that locality of reference is often the determining factor for any application that makes extensive use and traversal of the elements.最后,重要的是要指出,对于任何广泛使用和遍历元素的应用程序,引用的位置通常是决定因素。 When using a dynamic array, the elements are packed together in memory one after the other and doing an in-order traversal is very efficient as the CPU can preemptively cache the memory ahead of the reading / writing operations.使用动态数组时,元素一个接一个地打包在内存中,按顺序遍历非常有效,因为 CPU 可以在读/写操作之前抢先缓存内存。 In a vanilla linked-list implementation, the jumps from one element to the next generally involves a rather erratic jumps between wildly different memory locations, which effectively disables this "pre-fetching" behavior.在普通的链表实现中,从一个元素到下一个元素的跳转通常涉及在截然不同的内存位置之间相当不稳定的跳转,这有效地禁用了这种“预取”行为。 So, unless the individual elements of the list are very big and operations on them are typically very long to execute, this lack of pre-fetching when using a linked-list will be the dominant performance problem.因此,除非列表的单个元素非常大并且对它们的操作通常需要很长时间才能执行,否则在使用链表时缺少预取将成为主要的性能问题。

As you can guess, I rarely use a linked-list ( std::list ), as the number of advantageous applications are few and far between.你可以猜到,我很少使用链表( std::list ),因为有利的应用程序的数量很少。 Very often, for large and expensive-to-copy objects, it is often preferable to simply use a vector of pointers (you get basically the same performance advantages (and disadvantages) as a linked list, but with less memory usage (for linking pointers) and you get random-access capabilities if you need it).很多时候,对于大而昂贵的复制对象,通常更可取的做法是简单地使用指针向量(您获得与链表基本相同的性能优势(和劣势),但内存使用较少(用于链接指针) ),如果需要,您可以获得随机访问功能)。

The main case that I can think of, where a linked-list wins over a dynamic array (or a segmented dynamic array like std::deque ) is when you need to frequently insert elements in the middle (not at either ends).我能想到的主要情况是,当您需要经常在中间(而不是两端)插入元素时,链表胜过动态数组(或像std::deque这样的分段动态数组)。 However, such situations usually arise when you are keeping a sorted (or ordered, in some way) set of elements, in which case, you would use a tree structure to store the elements (eg, a binary search tree (BST)), not a linked-list.但是,当您保留一组已排序(或以某种方式排序)的元素时,通常会出现这种情况,在这种情况下,您将使用树结构来存储元素(例如,二叉搜索树 (BST)),不是链表。 And often, such trees store their nodes (elements) using a semi-contiguous memory layout (eg, a breadth-first layout) within a dynamic array or segmented dynamic array (eg, a cache-oblivious dynamic array).并且通常,这样的树使用动态阵列或分段动态阵列(例如,高速缓存遗忘的动态阵列)内的半连续存储器布局(例如,广度优先布局)来存储它们的节点(元素)。

Yes, it's true for C++ or any other language.是的, C++或任何其他语言都是如此。 Dynamic array is a concept .动态数组是一个概念 The fact that C++ has vector doesn't change the theory. C++ 具有vector的事实并没有改变理论。 The vector in C++ actually does the resizing internally so this task isn't the developers' responsibility. C++的向量实际上在内部进行大小调整,因此此任务不是开发人员的责任。 The actual cost doesn't magically disappear when using vector , it's simply offloaded to the standard library implementation.使用vector ,实际成本不会神奇地消失,它只是简单地卸载到标准库实现中。

std::vector is implemented using a dynamic array, whereas std::list is implemented as a linked list. std::vector使用动态数组实现,而std::list实现为链表。 There are trade-offs for using both data structures.使用这两种数据结构需要权衡。 Pick the one that best suits your needs.选择最适合您需求的一种。

  • As you indicated, a dynamic array can take a larger amount of time adding an item if it gets full, as it has to expand itself.正如您所指出的,如果动态数组已满,则添加项目可能需要更多时间,因为它必须自行扩展。 However, it is faster to access since all of its members are grouped together in memory.但是,由于其所有成员都在内存中组合在一起,因此访问速度更快。 This tight grouping also usually makes it more cache-friendly.这种紧密的分组通常也使它对缓存更友好。

  • Linked lists don't need to resize ever, but traversing them takes longer as the CPU must jump around in memory.链表永远不需要调整大小,但遍历它们需要更长的时间,因为 CPU 必须在内存中跳转。

That got me wondering if this is true for c++ as there is a vector class which automatically resizes whenever new elements are inserted.这让我想知道这是否适用于 C++,因为有一个向量类,它会在插入新元素时自动调整大小。

Yes, it still holds, because a vector resize is a potentially expensive operation.是的,它仍然成立,因为vector调整大小是一项潜在的昂贵操作。 Internally, if the pre-allocated size for the vector is reached and you attempt to add new elements, a new allocation takes place and the old data is moved to the new memory location.在内部,如果达到向量的预分配大小并且您尝试添加新元素,则会发生新分配并将旧数据移动到新内存位置。

From the C++ documentation :C++ 文档

vector::push_back - Add element at the end vector::push_back - 在最后添加元素

Adds a new element at the end of the vector, after its current last element.在向量的末尾添加一个新元素,在其当前最后一个元素之后。 The content of val is copied (or moved) to the new element. val 的内容被复制(或移动)到新元素。

This effectively increases the container size by one, which causes an automatic reallocation of the allocated storage space if -and only if- the new vector size surpasses the current vector capacity.这有效地将容器大小增加了 1,当且仅当新向量大小超过当前向量容量时,这会导致自动重新分配已分配的存储空间。

http://channel9.msdn.com/Events/GoingNative/GoingNative-2012/Keynote-Bjarne-Stroustrup-Cpp11-Style Skip to 44:40. http://channel9.msdn.com/Events/GoingNative/GoingNative-2012/Keynote-Bjarne-Stroustrup-Cpp11-Style跳至 44:40。 You should prefer std::vector whenever possible over a std::list , as explained in the video, by Bjarne himself.正如 Bjarne 本人在视频中所解释的那样,您应该尽可能选择std::vector而不是std::list Since std::vector stores all of it's elements next to each other, in memory, and because of that it will have the advantage of being cached in memory.由于std::vector将所有元素std::vector存储在内存中,因此它将具有缓存在内存中的优势。 And this is true for adding and removing elements from std::vector and also searching.这适用于从std::vector添加和删​​除元素以及搜索。 He states that std::list is 50-100x slower than a std::vector .他说std::liststd::vector慢 50-100 倍。

If you really want a stack, you should really use std::stack instead of making your own.如果你真的想要一个堆栈,你真的应该使用std::stack而不是自己制作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM