简体   繁体   中英

Most performant way to find all the leaf nodes in a tree data structure

I have a tree data structure where each node can have any number of children, and the tree can be of any height. What is the optimal way to get all the leaf nodes in the tree? Is it possible to do better than just traversing every path in the tree until I hit the leaf nodes?

In practice the tree will usually have a max depth of 5 or so, and each node in the tree will have around 10 children.

I'm open to other types of data structures or special trees that would make getting the leaf nodes especially optimal.

I'm using javascript but really just looking for general recommendations, any language etc.

Thanks!

Finding the leaves of a tree is O(n) , which is optimal for a tree, because you have to look at O(n) places to retrieve all n things, plus the branch nodes along the way. The constant overhead is the branch nodes.

If we increase the branching factor, eg letting each branch have 32 children instead of 2, we significantly decrease the number of overhead nodes, which might make the traversal faster.

If we skip a branch, we're not including the values in that branch, so we have to look at all branches.

Memory layout is essential to optimal retrieval, so the child lists should be contiguous and not linked list, the nodes should be place after each other in retrieval order.

The more static your tree is, the better layout can be done.

All in one layout

  • All in one array totally ordered

  • Pro

    • memory can be streamed for maximal throughput (hardware pre-fetch)
    • no unneeded page lookups
    • normal lookups can be made
    • no extra memory to make linked lists.
    • internal nodes use offset to find the child relative to itself
  • Con

    • inserting / deleting can be cumbersome
    • insert / delete O(N)
    • insert might lead to resize of the array leading to a costly copy

Two array layout

  • One array for internal nodes
  • One array for leafs
  • Internal nodes points to the leafs

  • Pro

    • leaf nodes can be streamed at maximum throughput (maybe the best layout if your mostly only interested in the leafs).
    • no unneeded page lookups
    • indirect lookups can be made
  • Con

    • if all leafs are ordered insert / delete can be cumbersome
    • if leafs are unordered insertion is ease, just add at the end.
    • deleting unordered leafs is also a problem if no tombstones are allowed as the last leaf would have to be moved back and the internal nodes would need fix up. (via a further indirection this can also be fixed see slot-map)
    • resizing of the either might lead to a large copy, though less than the All-in-one as they could be done independently.

Array of arrays (dynamic sized, C++ vector of vectors)

  • using contiguous arrays for referencing the children of each node
  • Pro
    • running through each child list is fast
    • each child array may be resized independently
  • Con
    • while removing much of the extra work of linked list children the individual lists are dispersed among all other data making lookup taking extra time.
    • insert might cause resize and copy of an array.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM