简体   繁体   English

O(1)算法确定节点是否是多路树中另一个节点的后代?

[英]O(1) algorithm to determine if node is descendant of another node in a multiway tree?

Imagine the following tree: 想象一下以下的树:

    A
   / \
  B   C
 / \   \
D   E   F

I'm looking for a way to query if for example F is a descendant of A (note: F doesn't need to be a direct descendant of A), which, in this particular case would be true. 我正在寻找一种方法来查询例如F是否是A的后代(注意:F不需要是A的直接后代),在这种情况下,这将是真的。 Only a limited amount of potential parent nodes need to be tested against a larger potential descendants node pool. 只需要针对较大的潜在后代节点池测试有限数量的潜在父节点。

When testing whether a node is a descendant of a node in the potential parent pool, it needs to be tested against ALL potential parent nodes. 在测试节点是否是潜在父池中节点的后代时,需要针对所有潜在父节点对其进行测试。

This is what a came up with: 这是一个想法:

  • Convert multiway tree to a trie, ie assign the following prefixes to every node in the above tree: 将多路树转换为trie,即将以下前缀分配给上述树中的每个节点:

      A = 1 B = 11 C = 12 D = 111 E = 112 F = 121 
  • Then, reserve a bit array for every possible prefix size and add the parent nodes to be tested against, ie if C is added to the potential parent node pool, do: 然后,为每个可能的前缀大小保留一个位数组,并添加要测试的父节点,即如果将C添加到潜在的父节点池,请执行以下操作:

      1 2 3 <- Prefix length *[1] [1] ... [2] *[2] ... [3] [3] ... [4] [4] ... ... ... 
  • When testing if a node is a descendant of a potential parent node, take its trie prefix, lookup the first character in the first "prefix array" (see above) and if it is present, lookup the second prefix character in the second "prefix array" and so on, ie testing F leads to: 当测试节点是否是潜在父节点的后代时,取其trie前缀,查找第一个“前缀数组”中的第一个字符(见上文),如果存在,则在第二个“前缀中查找第二个前缀字符数组“依此类推,即测试F导致:

      F = 1 2 1 *[1] [1] ... [2] *[2] ... [3] [3] ... [4] [4] ... ... ... 

    so yes F, is a descendant of C. 所以是的,F是C.的后代。

This test seems to be worst case O(n), where n = maximum prefix length = maximum tree depth, so its worst case is exactly equal to the obvious way of just going up the tree and comparing nodes. 这个测试似乎是最坏的情况O(n),其中n =最大前缀长度=最大树深度,所以它的最坏情况恰好等于上升树和比较节点的显而易见的方式。 However, this performs much better if the tested node is near the bottom of the tree and the potential parent node is somewhere at the top. 但是,如果测试节点靠近树的底部并且潜在的父节点位于顶部某处,则执行得更好。 Combining both algorithms would mitigate both worst case scenarios. 结合两种算法可以减轻最坏情况。 However, memory overhead is a concern. 但是,内存开销是一个问题。

Is there another way for doing that? 还有另一种方法吗? Any pointers greatly appreciated! 任何指针非常感谢!

Are your input trees always static? 您的输入树是否始终是静态的? If so, then you can use a Lowest Common Ancestor algorithm to answer the is descendant question in O(1) time with an O(n) time/space construction. 如果是这样,那么您可以使用最低公共祖先算法在O(1)时间内使用O(n)时间/空间构造回答后代问题。 An LCA query is given two nodes and asked which is the lowest node in the tree whose subtree contains both nodes. LCA查询被给予两个节点并询问哪个是树的最低节点,其子树包含两个节点。 Then you can answer the IsDescendent query with a single LCA query, if LCA(A, B) == A or LCA(A, B) == B, then one is the descendent of the other. 然后,您可以使用单个LCA查询回答IsDescendent查询,如果LCA(A,B)== A或LCA(A,B)== B,则一个是另一个的后代。

This Topcoder algorithm tuorial gives a thorough discussion of the problem and a few solutions at various levels of code complexity/efficiency. 这个Topcoder算法课程提供了对问题的全面讨论以及各种代码复杂性/效率级别的解决方案。

I don't know if this would fit your problem, but one way to store hierarchies in databases, with quick "give me everything from this node and downwards" features is to store a "path". 我不知道这是否适合你的问题,但是一种在数据库中存储层次结构的方法,快速“给我这个节点和向下的所有东西”功能就是存储一个“路径”。

For instance, for a tree that looks like this: 例如,对于看起来像这样的树:

    +-- b
    |
a --+       +-- d
    |       |
    +-- c --+
            |
            +-- e

you would store the rows as follows, assuming the letter in the above tree is the "id" of each row: 你会按如下方式存储行,假设上面树中的字母是每行的“id”:

id    path
a     a
b     a*b
c     a*c
d     a*c*d
e     a*c*e

To find all descendants of a particular node, you would do a "STARTSWITH" query on the path column, ie. 要查找特定节点的所有后代,您可以在路径列上执行“STARTSWITH”查询,即。 all nodes with a path that starts with a*c* 所有节点的路径a*c*开头

To find out if a particular node is a descendant of another node, you would see if the longest path started with the shortest path. 要查明特定节点是否是另一个节点的后代,您将看到最长路径是否以最短路径开始。

So for instance: 例如:

  • e is a descendant of a since a*c*e starts with a e是a a*c*ea开头的后代
  • d is a descendant of c since a*c*d starts with a*c d是c的后代,因为a*c*da*c开头

Would that be useful in your instance? 这对你的实例有用吗?

Traversing any tree will require "depth-of-tree" steps. 遍历任何树将需要“树深度”步骤。 Therefore if you maintain balanced tree structure it is provable that you will need O(log n) operations for your lookup operation. 因此,如果您维护平衡的树结构,则可以证明您需要执行查找操作的O(log n)操作。 From what I understand your tree looks special and you can not maintain it in a balanced way, right? 根据我的理解,你的树看起来很特别,你无法以平衡的方式保持它,对吧? So O(n) will be possible. 所以O(n)是可能的。 But this is bad during creation of the tree anyways, so you will probably die before you use the lookup anyway... 但是在树的创建过程中这很糟糕,所以你可能会在使用查找之前死掉...

Depending on how often you will need that lookup operation compared to insert , you could decide to pay during insert to maintain an extra data structure. 根据您与insert相比需要查找操作的频率,您可以决定在插入期间支付以维护额外的数据结构。 I would suggest a hashing if you really need amortized O(1) . 如果你真的需要摊销O(1),我会建议哈希。 On every insert operation you put all parents of a node into a hashtable. 在每次插入操作中,您将节点的所有父节点放入哈希表中。 By your description this could be O(n) items on a given insert . 根据您的描述,这可能是给定插入物上的O(n)项。 If you do n inserts this sounds bad (towards O(n^2) ), but actually your tree can not degrade that bad, so you probably get an amortized overall hastable size of O(n log n) . 如果你做n 插入这听起来很糟糕(朝向O(n ^ 2) ),但实际上你的树不能降低那个坏,所以你可能得到一个摊销的总体不稳定大小O(n log n) (actually, the log n part depends on the degration-degree of your tree. If you expect it to be maximal degraed, don't do it.) (实际上, log n部分取决于树的降阶程度。如果你认为它最大程度地降级,请不要这样做。)

So, you would pay about O(log n) on every insert , and get hashtable efficiency O(1) for a lookup . 因此,您将在每个插入上支付大约O(log n) ,并获得哈希表效率O(1)以进行查找

For a M-way tree, instead of your bit array, why not just store the binary "trie id" (using M bits per level) with each node? 对于M路树而不是位数组,为什么不将每个节点存储二进制“trie id” (每个级别使用M位) For your example (assuming M==2) : A=0b01, B=0b0101, C=0b1001, ... 对于您的示例(假设M == 2)A=0b01, B=0b0101, C=0b1001, ...

Then you can do the test in O(1): 然后你可以在O(1)中进行测试:

bool IsParent(node* child, node* parent)
{ 
   return ((child->id & parent->id) == parent->id)
}

You could compress the storage to ceil(lg2(M)) bits per level if you have a fast FindMSB() function which returns the position of the most significant bit set: 如果你有一个快速FindMSB()函数返回最高有效位集的位置,你可以将存储压缩到每层的ceil(lg2(M))位:

mask = (1<<( FindMSB(parent->id)+1) ) -1;
retunr (child->id&mask == parent->id);

In a pre-order traversal, every set of descendants is contiguous. 在预先遍历中,每组后代都是连续的。 For your example, 以你为例,

A B D E C F
+---------+ A
  +---+ B
    + D
      + E
        +-+ C
          + F

If you can preprocess, then all you need to do is number each node and compute the descendant interval. 如果可以预处理,那么您需要做的就是为每个节点编号并计算后代间隔。

If you can't preprocess, then a link/cut tree offers O(log n) performance for both updates and queries. 如果无法预处理,则链接/剪切树为更新和查询提供O(log n)性能。

You can answer query of the form "Is node A a descendant of node B?" 您可以回答“节点A是节点B的后代吗?”形式的查询。 in constant time, by just using two auxiliary arrays. 在恒定的时间内,只需使用两个辅助阵列。

Preprocess the tree, by visiting in Depth-First order, and for each node A store its starting and ending time in the visit in the two arrays Start[] and End[]. 通过以深度优先顺序访问来预处理树,并且对于每个节点A,在两个数组Start []和End []中存储其访问的开始和结束时间。

So, let us say that End[u] and Start[u] are respectively the ending and starting time of the visit of node u. 所以,让我们说End [u]和Start [u]分别是节点u访问的结束和开始时间。

Then node u is a descendant of node v if and only if: 然后节点u是节点v的后代,当且仅当:

Start[v] <= Start[u] and End[u] <= End[v]. 开始[v] <=开始[u]和结束[u] <=结束[v]。

and you are done, checking this condition requires just two lookup in the arrays Start and End 你完成了,检查这个条件只需要在数组Start和End中进行两次查找

看看嵌套集模型选择非常有效,但更新速度太慢

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM