简体   繁体   中英

Get unique keys and their unique values in a list of nested dictionaries

I am trying to get a list of all the unique keys in a JSON with a list of their unique values.

An example input looks like this:

[{
    "key1": {"subkey1": "subvalue1", "subkey2": "subvalue2"},
    "key2": "value2",
    "key3": {"subkey3": "subvalue2"}
}, {
    "key4": "value4",
    "subkey1": "other_value",
    "key2": "value2"
}]

The expected output in this case would be:

{
    "key1": [{"subkey1": "subvalue1", "subkey2": "subvalue2"}],
    "subkey1": ["subvalue1", "other_value"],
    "subkey2": ["subvalue2"],
    "key2": ["value2"],
    "key3": [{"subkey3": "subvalue2"}],
    "key4": ["value4"]
}

I have tried setting up a recursive method to do this, put am not sure how to approach parsing the inner json while also keeping it as a distinct value for that key. I also had trouble getting the unique inner keys then trying to get the values from those inner keys.

Here is how I started, but do not exactly know what to do from here:

@staticmethod
def get_distinct_keys_and_distinct_values_from_dict(list_of_dicts: [dict]) -> dict:
    print(DataParser.get_all_unique_keys(list_of_dicts))
    keys = DataParser.get_all_unique_keys(dictionary)
    # Not sure how to get to inner json and use correct key path in dictionaries

@staticmethod
def get_all_unique_keys(list_of_dicts: [dict]) -> set:
    keys = set()
    for dictionary in list_of_dicts:
        dict_keys = dictionary.keys()
        for key in dict_keys:
            keys.add(key)
        for key in keys:
            if key in dict_keys:
                value = dictionary[key]
                if isinstance(value, dict):
                    keys = keys.union(DataParser.get_all_unique_keys([value]))
    return keys

You could use:

def get_unique_keys(d, d_out={}):
    if isinstance(d, list):
        for i in d:
            get_unique_keys(i, d_out)
    elif isinstance(d, dict):
        for k,v in d.items():
            if isinstance(v, dict):
                get_unique_keys(v, d_out)
            if k in d_out and v not in d_out[k]:
                d_out[k] += [v]
            else:
                d_out[k] = [v]

Output:

out = {}
get_unique_keys(lst, out)

>>> print(out)
{'subkey1': ['subvalue1', 'other_value'],
 'subkey2': ['subvalue2'],
 'key1': [{'subkey1': 'subvalue1', 'subkey2': 'subvalue2'}],
 'key2': ['value2'],
 'subkey3': ['subvalue2'],
 'key3': [{'subkey3': 'subvalue2'}],
 'key4': ['value4']}

Another solution:

def solution(data: list[dict]):
    result = dict()
    for d in data:
        collect_keys_and_values(d, result)
    return result

def collect_keys_and_values(data: dict, result: dict):
    for key, value in data.items():
        coll = result.setdefault(key, [])
        if value not in coll:
            coll.append(value)
        if isinstance(value, dict):
            collect_keys_and_values(value, result)


def main():
    print(solution([{
    "key1": {"subkey1": "subvalue1", "subkey2": "subvalue2"},
    "key2": "value2",
    "key3": {"subkey3": "subvalue2"}
}, {
    "key4": "value4",
    "subkey1": "other_value",
    "key2": "value2"
}]))


if __name__ == '__main__':
    main()
def foo(d_el, ans):
    try:
        for k in d_el.keys():
            if ans.get(k) and d_el[k] in ans[k]:
                continue
            ans.setdefault(k, []).append(d_el[k])
            foo(d_el[k], ans)
    except:
        pass

y = {}
for d_el in d:
    foo(d_el, y)
y
# {'key1': [{'subkey1': 'subvalue1', 'subkey2': 'subvalue2'}],
#  'subkey1': ['subvalue1', 'other_value'],
#  'subkey2': ['subvalue2'],
#  'key2': ['value2'],
#  'key3': [{'subkey3': 'subvalue2'}],
#  'subkey3': ['subvalue2'],
#  'key4': ['value4']}

You can what you want by making use of the json module in conjunction with the collections.defaultdict class which are both in the standard library (assuming your data is JSON serializable). This is because the JSON decoder supports an object_hook argument which is a function it will call everytime it encounters a dictionary. defaultdict is used to create dictionaries-of-sets to make tracking unique value easy.

The basic idea is to specify a function via this argument that "watches" what is being decoded and keeps track of all the keys and values encountered while processing the data. The function first converts the data to JSON format and then reverses the process with this "watcher" function in place:

Here's what I mean:

from collections import defaultdict
import json


inp = [{
           "key1": {"subkey1": "subvalue1", "subkey2": "subvalue2"},
           "key2": "value2",
           "key3": {"subkey3": "subvalue2"}
       }, {
           "key4": "value4",
           "subkey1": "other_value",
           "key2": "value2"
       }]


def  get_distinct_keys_and_distinct_values(data):
    results = defaultdict(set)

    def decode_dict(a_dict):  # "watcher" funtcion.
        for key, item in a_dict.items():
            if isinstance(item, dict):
                for value in item.values():
                    results[key].add(value)
            else:
               results[key].add(item)
        return a_dict

    json_repr = json.dumps(data)  # Convert to JSON format.
    json.loads(json_repr, object_hook=decode_dict)  # Return value ignored.
    # Return results converted into a dictionary of lists.
    return {key: list(value) for key, value in results.items()}

from pprint import pprint
pprint(get_distinct_keys_and_distinct_values(inp), sort_dicts=False, compact=0)

Results:

{'subkey1': ['subvalue1', 'other_value'],
 'subkey2': ['subvalue2'],
 'subkey3': ['subvalue2'],
 'key1': ['subvalue1', 'subvalue2'],
 'key2': ['value2'],
 'key3': ['subvalue2'],
 'key4': ['value4']}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM