简体   繁体   中英

How to access parent of nested json object

I have an arbitrary nested JSON object (parsed using json.load), consisting of dicts, lists, primitives etc. I traverse it depth first using recursion and tracking the path to the node in a linux fs format (for lists, I append /element to the path, so there can be multiple objects at a specific path), as following:

def traverse(node, path =''):
   if isinstance(node, dict):
      for key in node:
         traverse(node[key], path+'/'+key)
   elif isinstance(node, list):
      for elem in node:
         traverse(elem,path+'/element')

Each node may include a string which needs to be filled in using values of objects that can exist wherever in the tree, referenced in a relative path from the current node, eg: "{../../key} {./child1/child2/key}".
Problem : I can access the values of nodes that are children of the current node but I can't directly access the parent of the current node.
Solutions I thought: One solution I thought is to have a list of tuples (child,parent) and store the current node along with the child I'm about to recurse in, and then search that list in reverse when I need to go up. This is a bit dangerous because if the child is a primitive value then is will be equal to any other children of the same value and type and hence I may retrieve a wrong parent, but I think going through the list in reverse should take care of that right?
A different solution I thought is to have a dictionary with key being the child's path and value the parent node. I think this should work better because the only time paths clash is with list elements but they all have the same parent so it should be ok I think.

Are there any other suggestions? Or any comments on these two solutions? Thank you

The only way to get back to the parent of a dict (or any recursive structure that doesn't have back-pointers) is to remember it as you traverse.

But notice that you're already doing this: your path string is a path from the top to the current node.

Let's write a function that uses the path:

def follow(head, path):
    if not path:
        return head
    first, _, rest = path.partition('/')
    return follow(head[first], rest)

Of course it would probably be nicer to build up path as a tuple of keys instead of a string, so we don't have to split them off (and so we don't have to worry about escaping or quoting if any of the keys might contain / , etc.); you can always join them at the end.

And it might be even better to build up path as a tuple of nodes (or key-node pairs) instead of just the keys, so we can access the parent in constant time as just path[-1] instead of in logarithmic time as follow(head, path) . But it really depends on what you're actually trying to do; your real code presumably doesn't just traverse the tree, building up paths to each node, and then doing nothing with them.


However, a really nice way to solve this would be to turn the traversal inside-out: make traverse an iterator:

def traverse(node, path =''):
   if isinstance(node, dict):
      for key in node:
         yield from traverse(node[key], path+'/'+key)
   elif isinstance(node, list):
      for elem in node:
         yield from traverse(elem,path+'/element')
   yield (node, path)

Now we can just loop over traverse to do anything we want as a post-order depth-first traversal:

for node, path in traverse(root):
    # do something

And now you can change it easily to yield node, parent, path (whether parent is the parent node, or the parent key, or whatever you wanted).

In Python the vanilla objects (well at least those used for JSON blobs like list , dict , etc.) have no relations to collections and objects that contain them.

That makes sense since basically a list can contain the same object multiple times. Furthermore an object can be stored both in a dictionary and a set at the same time. Unless you perform some sort of garbage collection algorithm (which is performance-wise very inefficient, and will " scan " over all objects), there is thus no trivial way to reconstruct a list of objects that refer to a given object. Even if we would scan the objects, it is still far from trivial to decide what we will see as the " parent ", since there can be multiple parents.

In Python, a dictionary can even contain itself. Like:

# example of constructing a datastructure containing itself
some_dict = {}
some_dict['a'] = some_dict

Now we endlessly recurse on some_dict , for example some_dict['a']['a']['a'] is some_dict .

The point is that since your enumerate in a recursive way, you can maintain a stack that holds the ancestors. For example:

def traverse(node, path ='', stack=None):
    
    if isinstance(node, dict):
        for key in node:
            traverse(node[key], path+'/'+key, stack)
    elif isinstance(node, list):
        for elem in node:
            traverse(elem,path+'/element', stack)
    

So each node pushes itself on the stack before we inspect the node, and at the end, it pops itself from the stack. We pass the stack recursively, so each recursive call can inspect the stack (not only the parent, but the entire trail up to the root).

If you want to search for a node by attribute value, and know what the node's parent nodes were, you can use the following:

def find_node_callback(node, stack, callback, path=''):
    stack.append(node)
    if callback(node,path,stack):
        return True

    if isinstance(node, dict):
        for key in node:
            if find_node_callback(node[key],stack,callback,path+'/'+key):
                return True
    elif isinstance(node, list):
        for elem in node:
            if find_node_callback(elem,stack, callback,path+'/element'):
                return True

    stack.pop()
    return False

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM