简体   繁体   中英

Pointers or Indexes?

I have a network-like data structure, composed by nodes linked together. The nodes, whose number will change, will be stored in a std::vector<Node> in no particular order, where Node is an appropriate class.

I want to keep track of the links between nodes. Again, the number of these links will change, and I was thinking about using again a std::vector<Link> . The Link class has to contain the information about the two nodes it's connecting, as well as other link features.

Should Link contain

  1. two pointers to the two nodes?
  2. two integers, to be used as an indexes for the std::vector<Node> ?
  3. or should I adopt a different system (why?)

the first approach, although probably better, is problematic as the pointers will have to be regenerated every time I add or remove nodes from the network, but on the other hand that will free me from eg storing nodes in a random-access container.

This is difficult to answer in general. There are various performance and ease-of-use trade-offs.

Using pointers can provide a more convenient usage for some operations. For example

link.first->value

vs.

nodes[link.first].value

Using pointers may provide better or worse performance than indices. This depends on various factors. You would need to measure to determine which is better in your case.

Using indices can save space if you can guarantee that there are only a certain number of nodes. You can then use a smaller data type for the indices, whereas with pointers you always need to use the full pointer size no matter how many nodes you have. Using a smaller data type can have a performance benefit by allowing more links to fit within a single cache line.

Copying the network data structure will be easier with indices, since you don't have to recreate the link pointers.

Having pointers to elements of a std::vector can be error-prone, since the vector may move the elements to another place in memory after an insert.

Using indices will allow you to do bounds checking, which may make it easier to find some bugs.

Using indices makes serialization more straightforward.

All that being said, I often find indices to be the best choice overall. Many of the syntactical inconveniences of indices can be overcome by using convenience methods, and you can switch the indices to pointers during certain operations where pointers have better performance.

Specify the interface for the class you want to use or create. Write unit tests. Do the most simple thing to fulfill the unit tests.

So it depends on the interface of the class. For example if a Link doesn't export information about the nodes, then it doesn't really matter what approach you chose. On the other hand if you go for pointers, consider std::shared_ptr .

I would add a (or a number of) link pointer to your Node class and then hand maintain the links. This will save you having to use an additional container.

If you are looking for something a bit more structured you can try using Boost Intrusive . This effectively does the same thing in a more generalized fashion.

You can avoid the Link class altogether if you use:

struct Node
{
  std::vector<Node*> parents;
  std::vector<Node*> children;
};

With this approach,

  1. You avoid creating another class.
  2. Your memory requirements are reduced.
  3. You have to make fewer pointer traversals to traverse the network of Node s.

Downside. You have to make sure that:

  1. When creating or removing a link you have to update two objects.
  2. When you delete a Node , you have to remove pointers to it from its parents and children .

You could make it a std::vector<Node *> instead of std::vector<Node> and allocate the nodes with new .

Then:

  • You can store the pointers to the nodes in the Link class without fear of them becoming invalidated

  • You can still randomly access them in the nodes vector.

Downside is that you will need to remember to delete them when they are removed from the node list.

My Personal experience with vectors in graph like structures has brought up these invariants.

Don't store data in vectors, where other classes hold a pointer/reference

You have a graph like data structure. If the code is not performance critical (this is something different to performance sensitive!) you should not consider cache compacting your data structures.

  • If you don't know how large your graph will be and you have got your Node data in a vector all iterators and pointers are invalidated once your vector calls vector::reallocate() this means that you have to somehow have to regenerate your whole data structure and perhaps you have to create a copy of all of it and use dfs or similar to adjust the pointers. The same thing will happen if you want to remove data in the middle of one of your vectors.

  • If you know how large your data will be you'll be set in stone to keep it that way or you'll have huge headaches once you reconsider.

Don't use shared pointers to keep track of what needs to be freed

If you have a graph like data structure and you delete on performance critical paths it's unwise to call delete whenever your algorithm decides he doesn't need the data anymore. One possibility is to keep data on the heap (if it is performance critical consider a pool allocator) mark objects you don't need anymore either during your performance critical sections (if you really really need to save space you can consider pointer tagging) or use some simple mark and sweep algorithm afterwards to find items no longer needed (yes graph algorithms are one of those cases where sutter is saying garbage collection is faster than smart pointers).

Be aware that deferred destruction of objects means that you loose all RAII like features in your Node classes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM