简体   繁体   English

Python - 遍历边列表,对于具有特定属性的节点,找到具有不同特定属性的所有连接节点?

[英]Python - Iterate over an edge list, for nodes with a specific attribute, find all connected nodes with a different specific attribute?

I have an edge list containing 24 000 different edges between produts.我有一个边缘列表,其中包含产品之间的 24 000 个不同的边缘。 An edge is created between A and B if product B is a sub component of A.如果产品 B 是 A 的子组件,则在 A 和 B 之间创建一条边。

The edge list is on the following format:边缘列表采用以下格式:

 Parent | Child | Root | Child Meta
  AA1      BB1    AA1      ...  
  AA1      BB2    AA1      ...
  BB2      CC1    AA1      ...  
  AA2      BB3    AA2
  AA2      BB4    AA2
  BB4      CC1    AA2      ... 
  BB4      DD1    AA2      ...
  DD1      EE1    AA2
  DD1      EE2    AA2
  BB4      FF1    AA2
  FF1      GG1    AA2      ...
  GG1      EE3    AA2

So by grouping by Root I want, for all parents on the form DD* and FF* , find children on the form EE* they have a direct connection with.因此,通过我想要的Root分组,对于表单DD*FF*上的所有父母,在表单EE*上找到与他们有直接联系的孩子。 In the example above I want the output dataframe to look like在上面的示例中,我希望输出数据帧看起来像

 Parent | Child | Root | Child Meta
   DD1     EE1    AA2      ... 
   DD1     EE2    AA2      ...
   FF1     EE3    AA2      ...

The only way I know how to do this is by iterating over a pandas DataFrame and using recursive functions iterating over the children until I hit an EE* child.我知道如何做到这一点的唯一方法是迭代 Pandas DataFrame 并使用递归函数迭代子项,直到我遇到EE*子项。 This takes forever.这需要永远。 Is there a smart way to use networkx here maybe?有没有一种聪明的方法可以在这里使用networkx Or are there any other way I can do this using pandas that would be faster?或者有没有其他方法可以使用更快的熊猫来做到这一点?

If I understand the issue correctly, then it might be faster if you start at the bottom and find nodes going upwards.如果我正确理解这个问题,那么如果你从底部开始并找到向上的节点,它可能会更快。

Since you know the subset of children (E*) you want to find, if you start with the target children, all parents are by definition part of the result, and you don't have to filter.由于您知道要查找的子项 (E*) 的子集,如果您从目标子项开始,则根据定义,所有父项都是结果的一部分,您不必进行过滤。

In a plain iterative Python approach, something like this would find all parent nodes for "E*" children:在一个简单的迭代 Python 方法中,这样的事情会找到“E*”子节点的所有父节点:

(Please note that I have added an extra line with "BB3 DD1 AA2" to have another duplicate.) (请注意,我添加了一个带有“BB3 DD1 AA2”的额外行以进行另一个重复。)

data = """AA1      BB1    AA1
  AA1      BB2    AA1 
  BB2      CC1    AA1 
  AA2      BB3    AA2
  AA2      BB4    AA2
  BB4      CC1    AA2 
  BB3      DD1    AA2
  BB4      DD1    AA2
  DD1      EE1    AA2
  DD1      EE2    AA2
  BB4      FF1    AA2
  FF1      GG1    AA2
  GG1      EE3    AA2"""

# tuple (parent, child, root)
tuples = {tuple(l.split()) for l in data.split("\n")}

parentsByChild = {}
for node in tuples:
    p = set(parentsByChild.get(node[1], frozenset()))
    p.add(node)
    parentsByChild[node[1]] = frozenset(p)
# alternatively:
# from itertools import groupby
# parentsByChild = {c:frozenset(nodes) for c, nodes in groupby(sorted(tuples, key=lambda n: n[1]), lambda n: n[1])}

def expand(nodes):
    todo, found = set(nodes), set() 
    while todo:
        node = todo.pop()        
        if not node in found:
            found.add(node)
            todo.update((p for p in parentsByChild.get(node[0], set()) if p not in found))
    return found

leaves = {n for n in tuples if n[1].startswith("E")}
for t in expand(leaves):
    print(t)

This should be linear in the number of edges: We iterate over them once to collect the tuples and a second time to group the parents.这应该与边的数量成线性:我们迭代它们一次以收集元组,第二次对父项进行分组。 The expand call iterates over all "interesting" children and their parents, expanding parents only for new nodes, so we never do work twice for the same node. expand调用遍历所有“有趣的”子节点和它们的父节点,只为新节点扩展父节点,所以我们永远不会为同一个节点做两次工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用Python 2按XML在属性中查找所有节点 - Find all nodes by attribute in XML using Python 2 Python-IGraph / Networkx:在连接的图中查找特定节点的群集 - Python-IGraph / Networkx: Find clusters of specific nodes in connected graph 如何通过使用python3在具有XML中特定属性的节点中获取值? - How to get value in nodes with specific attribute in XML by using python3? 正在搜索连接到列表属性的属性的特定字符串? - Searching for a specific string connected to an attribute of an attribute of a list? 使用SQLAlchemy中的图,如何找到未通过边缘连接到某些特定其他节点的节点? - With graphs in SQLAlchemy, how can one find nodes that are not connected to some specific other node by an edge? Python:查找连接到n的所有节点(元组) - Python: Find all nodes connected to n (tuple) 在Python Networkx中的两个节点之间添加按节点的边属性 - Adding an edge by node attribute between two nodes in Python Networkx 通过属性在节点之间形成节点之间的边缘在python中的igraph中查找 - form edge between nodes by attribute look up in igraph in python 找到所有连接到 n 的节点 - Find all nodes connected to n 给出配对节点和随机节点列表,找到并分组 python 中连接的节点 - giving the paired nodes and a list of random nodes, find and group the nodes that are connected in python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM