简体   繁体   中英

How to cut a very “deep” json or dictionary in Python?

I have a json object which is very deep. In other words I have a dictionary, containing dictionaries containing dictionaries and so on many times. So, one can imagine it as a huge tree in which some nodes are very far from the root node.

Now I would like to cut this tree so that I have in it only nodes that are separated not more than N steps from the root. Is there a simple way to do it?

For example if I have:

{'a':{'d':{'e':'f', 'l':'m'}}, 'b':'c', 'w':{'x':{'z':'y'}}}

And I want to keep only nodes that are 2 steps from the root, I should get:

{'a':{'d':'o1'}, 'b':'c', 'w':{'x':'o2'}}

So, I just replace the far standing dictionaries by single values.

Given that your data is very deep, you may very well run into stack limits with recursion. Here's an iterative approach that you might be able to clean up and polish a bit:

import collections

def cut(dict_, maxdepth, replaced_with=None):
    """Cuts the dictionary at the specified depth.

    If maxdepth is n, then only n levels of keys are kept.
    """
    queue = collections.deque([(dict_, 0)])

    # invariant: every entry in the queue is a dictionary
    while queue:
        parent, depth = queue.popleft()
        for key, child in parent.items():
            if isinstance(child, dict):
                if depth == maxdepth - 1:
                    parent[key] = replaced_with
                else:
                    queue.append((child, depth+1))
def prune(tree, max, current=0):
    for key, value in tree.items():
        if isinstance(value, dict):
            if current == max:
                tree[key] = None
            else:
                prune(value, max, current + 1)

This is mostly an example to get you started. It prunes the dictionary in place. Eg:

>>> dic = {'a':{'d':{'e':'f', 'l':'m'}}, 'b':'c', 'w':{'x':{'z':'y'}}}
>>> prune(dic, 1)
>>> dic
{'b': 'c', 'w': {'x': None}, 'a': {'d': None}}

You could do something like:

initial_dict = {'a':{'d':{'e':'f', 'l':'m'}}, 'b':'c', 'w':{'x':{'z':'y'}}}
current_index = 0
for item in initial_dict.items():
    if isinstance(item[1], dict):
        current_index += 1
        initial_dict[item[0]] = {key:'o'+str(current_index) for key in item[1].keys()}

I believe one problem with this code is that for multiple keyed second level dicts (example follows) you would get the same value, but you can adapt the code to work it around.

Eg.:

# suppose you have this dict initially
initial_dict = {'a':{'d':{'e':'f', 'l':'m'}}, 'b':'c', 'w':{'x':{'z':'y'}, 'b':{'p':'r'}}}
# you would get
initial_dict = {'a':{'d':'o1'}}, 'b':'c', 'w':{'x':'o2', 'b':'o2'}}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM