识别树中的根父母及其所有孩子

Question

I have a pandas dataframe as such:我有一个 pandas dataframe 这样的：

parent   child   parent_level   child_level
A        B       0              1
B        C       1              2
B        D       1              2
X        Y       0              2
X        D       0              2 
Y        Z       2              3

This represents a tree that looks like this这代表一棵看起来像这样的树

       A  X
      /  / \
     B  /   \
    /\ /     \
   C  D       Y
              |
              Z

I want to produce something that looks like this:我想制作如下所示的东西：

root    children
A       [B,C,D]
X       [D,Y,Z]

or或者

root   child
A      B
A      C
A      D
X      D
X      Y
X      Z

What is the fastest way to do so without looping.没有循环的最快方法是什么。 I have a really large dataframe.我有一个非常大的 dataframe。

Answer 1

I suggest you use networkx , as this is a graph problem.我建议您使用networkx ，因为这是一个图形问题。 In particular the descendants function:特别是后代function：

import networkx as nx
import pandas as pd

data = [['A', 'B', 0, 1],
        ['B', 'C', 1, 2],
        ['B', 'D', 1, 2],
        ['X', 'Y', 0, 2],
        ['X', 'D', 0, 2],
        ['Y', 'Z', 2, 3]]

df = pd.DataFrame(data=data, columns=['parent', 'child', 'parent_level', 'child_level'])

roots = df.parent[df.parent_level.eq(0)].unique()
dg = nx.from_pandas_edgelist(df, source='parent', target='child', create_using=nx.DiGraph)

result = pd.DataFrame(data=[[root, nx.descendants(dg, root)] for root in roots], columns=['root', 'children'])
print(result)

Output Output

  root   children
0    A  {D, B, C}
1    X  {Z, Y, D}

Answer 2

With Recursion使用递归

def find_root(tree, child):
    if child in tree:
        return {p for x in tree[child] for p in find_root(tree, x)}
    else:
        return {child}

tree = {}
for parent, child in zip(df.parent, df.child):
    tree.setdefault(child, set()).add(parent)

descendents = {}
for child in tree:
    for parent in find_root(tree, child):
        descendents.setdefault(parent, set()).add(child)

pd.DataFrame(descendents.items(), columns=['root', 'children'])

  root   children
0    A  {B, D, C}
1    X  {Z, D, Y}

You could alternatively set up find_root as a generator您也可以将find_root设置为生成器

def find_root(tree, child):
    if child in tree:
        for x in tree[child]:
            yield from find_root(tree, x)
    else:
        yield child

Further, if you want to avoid recursion depth issues, you can use the "stack of iterators" pattern to define find_root此外，如果您想避免递归深度问题，您可以使用“迭代器堆栈”模式来定义find_root

def find_root(tree, child):
    stack = [iter([child])]
    while stack:
        for node in stack[-1]:
            if node in tree:
                stack.append(iter(tree[node]))
            else:
                yield node
            break
        else:  # yes!  that is an `else` clause on a for loop
            stack.pop()

Answer 3

My approach is this, you start from the bottom-most parent_level and collect the children in a dictionary.我的方法是这样的，你从最底层的 parent_level 开始，将孩子收集到字典中。 As you go up, when you find that a parent in the dict is the child of another parent, you add those children to the new parent, then delete the old parent.当您 go 向上时，当您发现 dict 中的父级是另一个父级的子级时，您将这些子级添加到新父级，然后删除旧父级。

I've made a quick %time test, this method much faster (4.77 µs compared to 5.58 ms using networkx).我做了一个快速的%time测试，这种方法要快得多（4.77 µs 与使用 networkx 的 5.58 ms 相比）。 Not too sure if it's the case when you scale up.不太确定当你扩大规模时是否是这种情况。 You can give it a try.你可以试一试。

import pandas as pd

data = [['A', 'B', 0, 1],
        ['B', 'C', 1, 2],
        ['B', 'D', 1, 2],
        ['X', 'Y', 0, 2],
        ['X', 'D', 0, 2],
        ['Y', 'Z', 2, 3]]

df = pd.DataFrame(data=data, columns=['parent', 'child', 'parent_level', 'child_level'])

current_roots = {}

for parent_level in range(df.parent_level.max(), -1, -1):
    children_to_remove_from_root = []

    for (root, rows) in df[df.parent_level == parent_level].groupby('parent'):
        children = rows['child'].values.tolist()
        current_roots[root] = children

        for child in children:
            if child in current_roots:
                current_roots[root] += current_roots[child]
                children_to_remove_from_root.append(child)


    for child in children_to_remove_from_root:
        del current_roots[child]

print(current_roots)

识别树中的根父母及其所有孩子

问题描述

2 个解决方案

解决方案1
8 已采纳 2019-10-16 14:18:03

解决方案2
4 2019-10-16 14:46:35

With Recursion使用递归

解决方案3
0 2019-10-16 14:34:48

识别树中的根父母及其所有孩子

问题描述

2 个解决方案

解决方案1 8 已采纳 2019-10-16 14:18:03

解决方案2 4 2019-10-16 14:46:35

With Recursion使用递归

解决方案3 0 2019-10-16 14:34:48

解决方案1
8 已采纳 2019-10-16 14:18:03

解决方案2
4 2019-10-16 14:46:35

解决方案3
0 2019-10-16 14:34:48