简体   繁体   中英

Print unique JSON keys in dot notation using Python

I am trying to write a script that will print the unique keys of a JSON file in dot notation so as to quickly profile the structure.

For example let's say I have 'myfile.json' with the following format:

{
"a": "one",
"b": "two",
"c": {
    "d": "four",
    "e": "five",
    "f": [
        {
            "x": "six",
            "y": "seven"
        },
        {
            "x": "eight",
            "y": "nine"
        }
    ]
}

Running the following will produce a unique set of keys, but it is missing the lineage.

import json
json_data = open("myfile.json")
jdata = json.load(json_data)

def get_keys(dl, keys_list):
    if isinstance(dl, dict):
        keys_list += dl.keys()
        map(lambda x: get_keys(x, keys_list), dl.values())
    elif isinstance(dl, list):
        map(lambda x: get_keys(x, keys_list), dl)

keys = []
get_keys(jdata, keys)

all_keys = list(set(keys))

print '\n'.join([str(x) for x in sorted(all_keys)])

The following output doesn't indicate that 'x', 'y' are nested within the 'f' array.

a
b
c
d
e
f
x
y

I can't figure out how to loop through the nested structure to append the parent keys.

The ideal output would be:

a
b
c.d
c.e
c.f.x
c.f.y

I'd recommend using a recursive generator function, using the yield statement rather than building a list internally. In Python 2.6+, the following works:

import json
json_data = json.load(open("myfile.json"))

def walk_keys(obj, path=""):
    if isinstance(obj, dict):
        for k, v in obj.iteritems():
            for r in walk_keys(v, path + "." + k if path else k):
                yield r
    elif isinstance(obj, list):
        for i, v in enumerate(obj):
            s = "[" + str(i) + "]"
            for r in walk_keys(v, path + s if path else s):
                yield r
    else:
        yield path


for s in sorted(walk_keys(json_data)):
    print s

In Python 3.x, you can use yield from as syntactic sugar for recursive generation, as follows:

import json
json_data = json.load(open("myfile.json"))

def walk_keys(obj, path=""):
    if isinstance(obj, dict):
        for k, v in obj.items():
            yield from walk_keys(v, path + "." + k if path else k)
    elif isinstance(obj, list):
        for i, v in enumerate(obj):
            s = "[" + str(i) + "]"
            yield from walk_keys(v, path + s if path else s)
    else:
        yield path


for s in sorted(walk_keys(json_data)):
    print(s)

Drawing off of MTADD's guidance I put together the following:

import json

json_file_path = "myfile.json"
json_data = json.load(open(json_file_path))

def walk_keys(obj, path = ""):
    if isinstance(obj, dict):
        for k, v in obj.iteritems():
            for r in walk_keys(v, path + "." + k if path else k):
                yield r
    elif isinstance(obj, list):
        for i, v in enumerate(obj):
            s = ""
            for r in walk_keys(v, path if path else s):
                yield r
    else:
        yield path

all_keys = list(set(walk_keys(json_data)))

print '\n'.join([str(x) for x in sorted(all_keys)])

The results match as expected

a
b
c.d
c.e
c.f.x
c.f.y

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM