简体   繁体   English

在插入期间逐步存储从根节点到多路树节点的路径,以使存储操作不具有O(n)的复杂度

[英]Progressively store the path from root node to node of multiway tree during insertion so that the storage operation does not have a complexity of O(n)

I would like to ask if someone knows a performant way to store the path from the root node to a new node of a multiway tree during the insertion of the new node. 我想询问是否有人知道在插入新节点期间将路径从根节点存储到多路树的新节点的高效方法。 Eg, if I have the following tree: 例如,如果我有以下树:

多路树

For each node, I currently store an array of the path to the node from the root node during insertion in the following way by assigning a unique int ID to each children on the same depth: 对于每个节点,我通过以下列方式存储从根节点到节点的路径数组,方法是为同一深度的每个子节点分配唯一的int ID:

Root node -> [1]

Depth 1, child 1 of root -> [1, 1]
Depth 1, child 2 of root -> [1, 2]

Depth 2, child 1 of parent 1 -> [1, 1, 1]
Depth 2, child 2 of parent 1 -> [1, 1, 2]
Depth 2, child 3 of parent 1 -> [1, 1, 3]
Depth 2, child 1 of parent 2 -> [1, 2, 4]
Depth 2, child 2 of parent 2 -> [1, 2, 5]

Depth 3, child 1 of parent 3 -> [1, 1, 3, 1]

...

If I now insert a new node from the leaf node 1 on depth 3, I would have to create a new path array for it storing all the nodes of the parent 1 (ie [1, 1, 3, 1] ) plus the new child ID, which is 1 for the first child: 如果我现在从深度为3的叶节点1插入一个新节点,我将不得不创建一个新的路径数组,用于存储父节点1所有节点(即[1, 1, 3, 1] )加上新节点子ID,第一个孩子为1

Depth 4, child 1 of parent 1 -> [1, 1, 3, 1, 1]

As my tree grows a lot in height (the number of children per depth is relatively low, but the depth can be high), the slow part of this algorithm would be this array recreation process. 随着我的树高度增长(每个深度的孩子数量相对较少,但深度可能很高),这个算法的缓慢部分将是这个数组重新创建过程。 Just imagine a tree of depth 1.000.000 , if I insert a new node from a node of depth 1.000.000 , I would have to create a new array for this new node storing all the 1.000.001 IDs of the parent plus appending the new node's ID: 想象一下深度为1.000.000的树,如果我从深度为1.000.000的节点插入一个新节点,我将不得不为这个新节点创建一个新数组,存储所有父节点的1.000.001 ID并附加新节点的ID:

Depth 1.000.001, child 1 of parent x -> [...1 million and one IDs... , 1]

Is there a more efficient way to store the path on each node during node's insertion? 在节点插入期间,是否有更有效的方法在每个节点上存储路径?

I basically need this to determine if any given node is a child of a possible parent node in the tree and as I have the path stored in each node, I can easily do that by checking the path array of the child, like this: 我基本上需要这个来确定任何给定节点是否是树中可能的父节点的子节点,并且因为我在每个节点中存储了路径,所以我可以通过检查子节点的路径数组轻松地做到这一点,如下所示:

// Ex. 1
Is node 4 of depth 2 the parent/ancestor of node 1 of depth 3?

node 1 of depth 3 has the following path array: pathArray = [1, 1, 3, 1]
Its ancestor ID on depth 2 is: pathArray[2] -> 3

3 != 4 and therefore I know that node 4 of depth 2
is not a parent of node 1 of depth 3.

// Ex. 2
Is node 1 of depth 1 the parent/ancestor of node 1 of depth 3?

node 1 of depth 3 has the following path array: pathArray = [1, 1, 3, 1]
Its ancestor ID on depth 1 is: pathArray[1] -> 1

1 == 1 and therefore I know that node 1 of depth 1
is a parent of node 1 of depth 3.

This lookup operation would be fast, the problem is the creation of the path array as the tree goes deeper. 这个查找操作会很快,问题是随着树的深入而创建路径数组。

Any suggestions would be appreciated. 任何建议,将不胜感激。

Thank you for the attention. 谢谢你的关注。

Right now, your solution has O(1) lookup time, O(h) insert time and O(n^2) space conpelxity, where n is the number of nodes and' h is the height, which is at most n . 现在,您的解决方案具有O(1)查找时间, O(h)插入时间和O(n^2)空间conpelxity,其中n是节点数,' h是高度,最多为n

You can achieve a tradeoff with O(log n) lookup, O((log n)^2) insert and O(n log n) space in the following way: 您可以通过以下方式与O(log n)查找, O((log n)^2)插入和O(n log n)空间进行权衡:

Let every node store one jump pointer to each of its ancestors with distance 1 (its parent), 2 (grandparent), 4 (grandparent's grandparent), 8, 16 and so on, until the root is reached or passed. 让每个节点存储一个指向其每个祖先的跳转指针 ,距离为1(其父级),2(祖父母),4(祖父母的祖父母),8,16等,直到达到或传递根。 The maximum distance from any node to the root is n , so for every node you store O(log n) jump-pointers. 从任何节点到根节点的最大距离为n ,因此对于每个节点,您都存储O(log n)跳转指针。 Since you do this for every node, the total space complexity is O(n log n) . 由于您为每个节点执行此操作,因此总空间复杂度为O(n log n)

Answering the query of whether y is an ancestor of x is trivial if y doesn't have a lower depth than x . 如果y深度不比x低,则回答y是否是x的祖先的查询是微不足道的。 Name the depths of the nodes dy and dx . 命名节点dydx的深度。 You know that if y is an ancestor of x , then it is the dx-dy 'th ancestor of x . 你知道,如果y是的祖先x ,那么它是dx-dy “的日始祖x That is, if dy = 5 and dx = 17 , you know that if y is x 's ancestor, then it is 17 - 5 levels above x . 也就是说,如果dy = 5dx = 17 ,你知道如果yx的祖先,那么它比x高出17 - 5

Therefore, you can perform lookups by recursively jumping the largest possible distance upwards in the tree from x without overshooting the target ancestor. 因此,您可以通过递归地从x向上跳过树中最大可能的距离来执行查找,而不会超出目标祖先。 For instance, if you're starting at depth 16 and want to find the ancestor at depth 6, you're interested in the ancestor 10 levels above. 例如,如果您从深度16开始并想要在深度6处找到祖先,那么您对上面的10级祖先感兴趣。 You cannot jump 16 levels up, as this would overshoot the target ancestor, so you jump 8 instead. 你不能跳16级,因为这会超过目标祖先,所以你跳8。 Now you're at depth 16-8=8, and the remaining distance to the target ancestor, which is 6, is 2. Since there is a pointer which goes exactly two steps up, you follow that and you've arrived at the target ancestor. 现在你的深度为16-8 = 8,与目标祖先的剩余距离是6,因为有一个指针正好向上两步,你就跟着它然后你到达了目标祖先。

Every time you follow a pointer upwards in the tree, you're getting at least half way to your target, so the maximum number of pointers you can follow is O(log n) . 每当你在树中向上跟踪指针时,你至少会到目标的一半,所以你可以遵循的最大指针数是O(log n)

When inserting a node e as a child of another node x you can construct e 's jump-pointers by finding x 's ancestors with distance 1, 3, 7, 15, etc. (since e is one level further away from all of these than x is). 当插入一个节点e作为另一个节点x的子节点时,你可以通过找到距离为1,3,7,15等的x的祖先来构造e的跳跃指针(因为e比所有的节点都远一个级别)这些比x是)。 There are O(log n) such searches. O(log n)这样的搜索。 As we argued above, each of the lookups take O(log n) time. 正如我们上面所论述的,每个查找都需要O(log n)时间。 Thus the total is O((log n)^2) . 因此总数为O((log n)^2)

This operation might even be made even faster by storing some additional information, but I can't see exactly how just now. 通过存储一些额外的信息,甚至可以更快地完成此操作,但我现在无法确切地看到它。

NOTE This idea is actually a part of the classical solution for the Level Ancestor Problem . 注意这个想法实际上是Level Ancestor问题的经典解决方案的一部分。 The classical solution allows for lookups as you have described them in O(1) time, while keeping the space of the entire data structure to O(n) . 经典解决方案允许在O(1)时间内对其进行查找,同时将整个数据结构的空间保持为O(n) However, the data structure is static, so the solution does not specify how to do insertions. 但是,数据结构是静态的,因此解决方案不指定如何进行插入。 There might be a way to adapt the level ancestor to a dynamic scenario and get even better running times than I've described here, but I'm not sure how. 可能有一种方法可以将级别祖先调整为动态场景,并获得比我在此描述的更好的运行时间,但我不确定如何。

Arrays have all their values stored contiguously in the memory. 数组的所有值都连续存储在内存中。 If you want to retain this property you must use them. 如果要保留此属性,则必须使用它们。 Or, if you are OK with hoping through several memory locations, you can store in every node only its immediate parent and trace up to the required level to do the check required. 或者,如果您希望通过多个内存位置,则可以在每个节点中仅存储其直接父级并跟踪到所需级别以执行所需的检查。

Map your nodes in a HashMap<node-id, node> . HashMap<node-id, node>映射HashMap<node-id, node>


Now, when you have to 现在,当你必须

determine if any given node is a child of a possible parent node, 确定任何给定节点是否是可能的父节点的子节点,

You can find the exact location of that node in the tree from the HashMap and then travel back up the tree using the parent pointers to see if the possible parent lies on the path to root. 您可以从HashMap中找到树中该节点的确切位置,然后使用父指针向上返回树,以查看可能的父节点是否位于root的路径上。

In a fairly balanced tree, this will be O(Log n) run-time (to traverse up the tree) and O(n) space (of HashMap). 在一个相当平衡的树中,这将是O(Log n)运行时(遍历树)和O(n)空间(HashMap)。


If you go with your current design of storing the path from each node to root, then you would have O(Log n) run-time (assuming a balanced tree) and O(n * Log n) space to store the Log n length path for each of the n nodes. 如果您使用当前存储从每个节点到根的路径的设计,那么您将具有O(Log n)运行时(假设平衡树)和O(n * Log n)空间来存储Log n长度n个节点中每个节点的路径。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM