简体   繁体   中英

How to Process Strings of Nodes to get their Connections?

I'm doing some work on node trees and I'm stuck with this issue. This list contains all the information of a tree:

connections = ['Module/Expr/ListComp/BinOp/Name/id/i/',
           'Module/Expr/ListComp/BinOp/Sub/',
           'Module/Expr/ListComp/BinOp/Num/0.5/',
           'Module/Expr/ListComp/comprehension/Name/id/i/',
           'Module/Expr/ListComp/comprehension/Name/id/inp/']

I need to convert this into:

{'Module':'Expr', 'Expr':'ListComp', 'ListComp':'BinOp comprehension', 
'BinOp':'Name Sub Num', 'Name':'id', 'id':'i', 'Num':'0.5', 
'comprehension':'Name', 'Name':'id', 'id':'i inp'}

The goal is to parse the connections into a dictionary of structure {'parent':'child(s)'} . In order to do this I have already tried this:

rules = {}
connections_list = [[word for word in path.split("/") if word] for path in connections]

for path in connections_list:
    for i, word in enumerate(path):
        same_level = [y[i+1] for y in connections_list if len(connections_list) > i+1]
        if same_level:
            unique_on_level = list(set(same_level))
            rules.update({word:" ".join(unique_on_level)})
        else:
            pass
    break
print(rules)

With an output:

{'Module': 'Expr',
 'Expr': 'ListComp',
 'ListComp': 'BinOp comprehension',
 'BinOp': 'Num Sub Name'}

I can't figure out a way of doing this, the issue here happens around the last nodes but I don't know how to solve it, any idea about how to fix this?

First create a mapping of parent to children nodes, and then remove the dupes.

rules = {}
for connection in connections:
    parts = connection.rstrip("/").split("/")
    for parent, child in zip(parts, parts[1:]):
        if parent not in rules:
            rules[parent] = []
        rules[parent].append(child)

rules = {k: " ".join({}.fromkeys(v)) for k, v in rules.items()}

Based on @wim 's answer and the comments, I think this should work:

from collections import defaultdict

rule_data = defaultdict(set)
for connection in connections:
    parts = connection.rstrip("/").split("/")
    for level, (parent, child) in enumerate(zip(parts, parts[1:])):
        rule_data[level, parent].add(child)

rules = [
    (parent, " ".join(sorted(children)))
    for (_level, parent), children in rule_data.items()
]

Notes:

  • Using set discards the order of the children; if it's important, we can instead use a dict (or, for compatibility with older versions of Python, OrderedDict ):

    • rule_data = defaultdict(dict)
    • rule_data[level, parent][child] = None
    • (parent, " ".join(children))
  • I sort the children for stability of the output, so that unit tests can work easily and so any downstream processing doesn't see spurious changes.

  • As @Prune noted, this doesn't seem like a natural representation for the data:

    • What are you ultimately trying to achieve?
    • The rule_data intermediate variable here may be more useful in further processing than the final form...
    • If any of the connections contain spaces, the output will be ambiguous.
connections = ['Module/Expr/ListComp/BinOp/Name/id/i/',
               'Module/Expr/ListComp/BinOp/Sub/',
               'Module/Expr/ListComp/BinOp/Num/0.5/',
               'Module/Expr/ListComp/comprehension/Name/id/i/',
               'Module/Expr/ListComp/comprehension/Name/id/inp/']

splitted_conns = [conn.strip('/').split('/') for conn in connections]
res = {}
for conn in splitted_conns:
    for root, child in zip(conn[:-1], conn[1:]):
        res[root] = res.get(root, set()) | {child}
print(res)

output:

{'Module': {'Expr'}, 'Expr': {'ListComp'}, 'ListComp': {'BinOp', 'comprehension'}, 'BinOp': {'Sub', 'Num', 'Name'}, 'Name': {'id'}, 'id': {'inp', 'i'}, 'Num': {'0.5'}, 'comprehension': {'Name'}}

Something like this? Your expected output contains duplicate node "id", I suposse that it's a mistake, it's doesn't?

About the solution, I've decided use a set for each group un children, so we can avoid fails with nodes that contain a space in its name. Moreover, with sets is easy to avoid duplicates. If you want to convert each set in a space separated string, yo can try this:

{k: ' '.join(v) for k, v in res.items()}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM