简体   繁体   中英

iterate over nested dictionary with one to many relationship

I have a nested dictionary with lists as the values, format below, sufficiently large that recursion fails.

aDict = {"R": [
            {"A": [
                {"B": [
                    "C", "D"
                ]}
            ]},
            {"E": [
                {"F": [
                    {"G": ["H"]}, "I"
                ]}
            ]}
    ]}

I need to iterate through the dictionary to add and update values; however, I am currently having trouble iterating through the tree and end up in an infinite loop. Other than Collections, I cannot use packages outside the standard library. :(

My current code assumes that the parent argument is already in the nested dictionary but the child argument is not.

def build_tree(aDict, parent, child, default=None):
    """"""
    stack = [iter(aDict.items())]
    while stack:
        for k, v in stack[-1]:  # loop through keys and values
            if isinstance(v, dict):
                stack.append(iter(v.items()))  # if v is type dict, append it to stack 
                break
            elif isinstance(v, list):
                for elem in v:  # if v is list, loop through elements of list
                    if isinstance(v, dict):
                        stack.append(iter(v.items()))
                    elif parent == elem:
                        a_dict = {parent: [child]}  # replace elem with a_dict
                        aDict[k].remove(parent)
                        aDict[k].append(a_dict)
                        return default
                    else:
                        pass
                break
            elif parent in k:
                v.append(child)  # add child to values list for parent
                return default
            elif parent in v:  # assumes v is list type
                a_dict = {parent: [child]}  # replace v with a_dict
                aDict[k].remove(parent)
                aDict[k].append(a_dict)
                return default
    else:
        stack.pop()
    return default

The function does not enter an infinite loop if the below code is commented out, but fails due to the presence of lists in the nested dictionary.

elif isinstance(v, list):
    for elem in v:  # if v is list, loop through elements of list
        if isinstance(v, dict):
            stack.append(iter(v.items()))
        elif parent == elem:
            a_dict = {parent: [child]}  # replace elem with a_dict
            aDict[k].remove(parent)
            aDict[k].append(a_dict)
            return default
        else:
            pass
    break

Thanks in advance!

You can write a simple recursive traversing function:

import sys

# for Python 3.x str is iterable, too, so we'll have to check for cross-version use
isPY3 = sys.version_info.major > 2

def traverse(data, level=0):
    if hasattr(data, "__iter__") and not (isPY3 and isinstance(data, str)):
        if isinstance(data, dict):  # maybe check for MutableMapping, too?
            for k in data:
                print("L{}: {}".format(level, k))  # dictionary key
                traverse(data[k], level + 1)
        else:
            for element in data:
                traverse(element, level + 1)
    elif data:
        print("L{}: {}".format(level, data))  # any other value

Which will recursively iterate through your iterables plus keep the track of the level it's currently in (you can pass other things as well, like the parent iterable etc.) This will print out (with your changed data):

L0: R
L2: A
L4: B
L6: C
L6: D
L2: E
L4: F
L6: G
L8: H
L6: I

But you can do whatever you want within the function (you can even further simplify it by removing PY3 checks). However, for very, very deep trees you'll hit the Python's recursion limit - but if you have such deep trees you should probably rethink your strategy / data structure as there is most certainly a better way to represent the same data (unless you're trying to map fractals) than infinitely deep trees...

This function non-recursively follows a path in the dict/list structure:

def by_path(data, path):
    """
    data is the dict of lists list structure: {key: [value,...]} where values are same or scalars.
    path is a sequence of keys in the dictionaries.
    """
    result = None
    level = [data]  # We always pretend to work with a list of dicts.
    traversed = []  # Purely for error reporting.
    for key in path:
        traversed.append(key)
        next_dicts = [d for d in level if isinstance(d, dict) and key in d]
        if not next_dicts:
            raise ValueError('Failed to make next step; traversed so far %r' % traversed)
        if len(next_dicts) > 1:
            raise ValueError('Duplicate keys at %r' % traversed)
        target_dict = next_dicts[0]
        level = target_dict[key]  # Guaranteed to work.
    # path exhausted.
    return level  # A list / scalar at the end of the path

It works like so:

>>> by_path(aDict, ['R', 'A', 'B'])
['C', 'D']
>>> by_path(aDict, ['R', 'A', 'wrong', 'path'])
(traceback elided)
ValueError: Failed to make next step; traversed so far ['R', 'A', 'wrong']

I hope this helps.

Of course if you often traverse the same long subpath, it might be worth caching. You'd have to invalidate the cache if you update it which is tricky; don't do it unless you actually see high CPU load and the profiler says it's indeed the traversal.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM