How to Process Strings of Nodes to get their Connections?

Question

I'm doing some work on node trees and I'm stuck with this issue. This list contains all the information of a tree:

connections = ['Module/Expr/ListComp/BinOp/Name/id/i/',
           'Module/Expr/ListComp/BinOp/Sub/',
           'Module/Expr/ListComp/BinOp/Num/0.5/',
           'Module/Expr/ListComp/comprehension/Name/id/i/',
           'Module/Expr/ListComp/comprehension/Name/id/inp/']

I need to convert this into:

{'Module':'Expr', 'Expr':'ListComp', 'ListComp':'BinOp comprehension', 
'BinOp':'Name Sub Num', 'Name':'id', 'id':'i', 'Num':'0.5', 
'comprehension':'Name', 'Name':'id', 'id':'i inp'}

The goal is to parse the connections into a dictionary of structure {'parent':'child(s)'} . In order to do this I have already tried this:

rules = {}
connections_list = [[word for word in path.split("/") if word] for path in connections]

for path in connections_list:
    for i, word in enumerate(path):
        same_level = [y[i+1] for y in connections_list if len(connections_list) > i+1]
        if same_level:
            unique_on_level = list(set(same_level))
            rules.update({word:" ".join(unique_on_level)})
        else:
            pass
    break
print(rules)

With an output:

{'Module': 'Expr',
 'Expr': 'ListComp',
 'ListComp': 'BinOp comprehension',
 'BinOp': 'Num Sub Name'}

I can't figure out a way of doing this, the issue here happens around the last nodes but I don't know how to solve it, any idea about how to fix this?

Answer 1

First create a mapping of parent to children nodes, and then remove the dupes.

rules = {}
for connection in connections:
    parts = connection.rstrip("/").split("/")
    for parent, child in zip(parts, parts[1:]):
        if parent not in rules:
            rules[parent] = []
        rules[parent].append(child)

rules = {k: " ".join({}.fromkeys(v)) for k, v in rules.items()}

Answer 2

Based on @wim 's answer and the comments, I think this should work:

from collections import defaultdict

rule_data = defaultdict(set)
for connection in connections:
    parts = connection.rstrip("/").split("/")
    for level, (parent, child) in enumerate(zip(parts, parts[1:])):
        rule_data[level, parent].add(child)

rules = [
    (parent, " ".join(sorted(children)))
    for (_level, parent), children in rule_data.items()
]

Notes:

Using set discards the order of the children; if it's important, we can instead use a dict (or, for compatibility with older versions of Python, OrderedDict ):
- rule_data = defaultdict(dict)
- rule_data[level, parent][child] = None
- (parent, " ".join(children))
I sort the children for stability of the output, so that unit tests can work easily and so any downstream processing doesn't see spurious changes.
As @Prune noted, this doesn't seem like a natural representation for the data:
- What are you ultimately trying to achieve?
- The rule_data intermediate variable here may be more useful in further processing than the final form...
- If any of the connections contain spaces, the output will be ambiguous.

Answer 3

connections = ['Module/Expr/ListComp/BinOp/Name/id/i/',
               'Module/Expr/ListComp/BinOp/Sub/',
               'Module/Expr/ListComp/BinOp/Num/0.5/',
               'Module/Expr/ListComp/comprehension/Name/id/i/',
               'Module/Expr/ListComp/comprehension/Name/id/inp/']

splitted_conns = [conn.strip('/').split('/') for conn in connections]
res = {}
for conn in splitted_conns:
    for root, child in zip(conn[:-1], conn[1:]):
        res[root] = res.get(root, set()) | {child}
print(res)

output:

{'Module': {'Expr'}, 'Expr': {'ListComp'}, 'ListComp': {'BinOp', 'comprehension'}, 'BinOp': {'Sub', 'Num', 'Name'}, 'Name': {'id'}, 'id': {'inp', 'i'}, 'Num': {'0.5'}, 'comprehension': {'Name'}}

Something like this? Your expected output contains duplicate node "id", I suposse that it's a mistake, it's doesn't?

About the solution, I've decided use a set for each group un children, so we can avoid fails with nodes that contain a space in its name. Moreover, with sets is easy to avoid duplicates. If you want to convert each set in a space separated string, yo can try this:

{k: ' '.join(v) for k, v in res.items()}

How to Process Strings of Nodes to get their Connections?

Question

3 answers

solution1
2 2021-02-09 22:21:01

solution2
1 2021-02-09 22:43:42

solution3
1 2021-02-09 23:56:55

How to Process Strings of Nodes to get their Connections?

Question

3 answers

solution1 2 2021-02-09 22:21:01

solution2 1 2021-02-09 22:43:42

solution3 1 2021-02-09 23:56:55

solution1
2 2021-02-09 22:21:01

solution2
1 2021-02-09 22:43:42

solution3
1 2021-02-09 23:56:55