简体   繁体   中英

What is the advantage of embedding a linked list into a data structure?

While reading about kernel data structures in FreeBSD, I stumbled on the MBuf . The MBuf contains a pointer to the next MBuf in a chain of MBuf 's, implementing a linked list. Each MBuf itself also contains data specific to that node in the linked list.

I'm more familiar with designs that segregate the container type from the value type (consider std::list , or System.Collections.Generic.LinkedList ). I'm struggling to understand the value proposition of embedding container semantics into the data type - what efficiencies are gained? Is it really all about eliminating the node instance pointer storage?

Consider you have an iterator/pointer to a node in your list. In order to fetch the data you have to:

  • read the pointer to data from the node
  • dereference the pointer you have just read and read the actual data

On the other hand, if the list concept is "embedded" within your data structure, you can read your object in a single memory operation as it is together with the node itself.

Another issue with separated list node and its data, is that the list node itself is small (usually just 2 or 3 pointers). As a result, the memory overhead of keeping such a small structure in memory can matter. You know -- operations such as new or malloc actually consume more memory than they allocate -- the system uses their own tree structures to keep track of where memory is free and where it is not.

In such scenarios, it is beneficial to group things up into a single allocation operation. You could try to keep several list nodes in small bundles, or you can try to connect each node with the data it allocates.

Similar strategy can be seen with intrusive pointers (versus shared pointers), or std::make_shared that packs object and smart pointer data together.


zett42 makes a comment that std::list<T> keeps T together with the node data. This achieves the single memory block as I explained above, but has a different problem: T cannot be polymorphic. If you have a class A and its derivative B , then node<B> is not a derivative of node<A> . If you try hard to insert B into std::list<A> , your object will:

  • In the best case, cause a compile error (no constructor A::A(const B&) )
  • In the worst case silently slice B copying only a part representing A into the node.

A typical solution if you want to hold polymorphic objects in a single list is to actually have std::list<A*> instead of std::list<A> . But then you end up with the extra indirection I explained above.

An alternative is to make an intrusive list (eg boost::intrusive::list ), where the node information is actually a part of A object. Then each node can be a derivative of A without a problem.

One big advantage of Intrusive linked list is that you can create a list of preexisting objects without any new allocations. To do this with a std::list of pointers will require memory allocation.

Boost has an intrusive list implementation with justification for use. http://www.boost.org/doc/libs/1_63_0/doc/html/intrusive.html

what efficiencies are gained? Is it really all about eliminating the node instance pointer storage?

I would say less cache misses and then better overall performance (even though linked lists aren't usually cache friendly data structures).
That way, you don't have to follow one more pointer to find your data somewhere in memory and bring them near to your processor for each node.
Moreover, if you construct your nodes in a contiguous area of memory and manage them with a couple of pointers (let's call them a free list and an in-use list, does it sound familiar?), you can have a boost in terms of performance (at least as long as the list doesn't contain many items, otherwise the risk is to jump back and forth in memory). In and deletions have constant time in this case (unless you have to search a node in the list before to insert in a specific position, of course), that is another advantage.

One of the key advantages of an intrusive list is that you can have a single node belong to multiple lists cheaply.

You could have, for instance, a collection of items sorted in 3 different ways, corresponding to its entries on 3 different lists. That would be quite clunky to do with std::list , for instance.

The other big advantage in my mind, as @doron mentions, is that list management requires 0 allocations, once you have the objects themselves created.

Boost has some decent discussion of intrusive vs. non-intrusive data structures , with pros and cons.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM