简体   繁体   中英

Convert deeply nested JSON to Pandas Dataframe with multi-index hierarchical index based on the JSON

I am seeking to convert a deeply nested json string into a Pandas Dataframe with hierarchical multiindex based on the hierarchy within the json.

I don't know the structure of the json and this will be used on a list of jsons with will likely have different hierarchies so this needs to be dynamic and cannot be hardcoded.

For the purpose of this question I'm using the following the "Example 2" data from the following link (though the actual data is much larger and more deeply nested): https://support.oneskyapp.com/hc/en-us/articles/208047697-JSON-sample-files

I am using the following code to interatively flatten the json:

import itertools as it
import pandas as pd

def flatten_json (dictionary):

    def unpack(parent_key,parent_value):
        if isinstance (parent_value,dict):
            for key, value in parent_value.items():
                temp = parent_key + "-" + key
                print("parent_key: ", parent_key," key: ", key)
                yield temp, value
        elif isinstance(parent_value, list):
            i = 0
            for value in parent_value:
                temp2 = parent_key + "_" + str(i)
                i += 1
                yield temp2, value
        else:
            yield parent_key, parent_value
    while True:
        dictionary = dict(it.chain.from_iterable(it.starmap(unpack,dictionary.items())))

        if not any(isinstance(value,dict) for value in dictionary.values()) and not any(isinstance(value,list) for value in dictionary.values()):
            break

    return dictionary

data = ***json data from link above as a string - omitting due to length***
output_table = pd.Series(flatten_json(data)).to_frame()

The produces the following output:

在此处输入图片说明

I want to produce the following output:

在此处输入图片说明

Found a possible solution with some playing around from the dictionary output from the original flatten_json() function. Not sure how efficient this is but it seems to work:

dictionary = flatten_json(data2)
all_values = list(dictionary.values())

index_list = []
for key in dictionary:
    x = tuple(key.split("-"))
    index_list.append(x)

index = pd.MultiIndex.from_tuples(index_list)

df = pd.Series(all_values, index=index).to_frame()

Any suggestions or comments welcome...

You can also flatten out the dictionary into a list of lists, make a DataFrame out of it, and then set the index to get the required output:

def flatten_json(dictionary):
    flattened = []

    def flatten(data, name=''):
        if type(data) is dict:
            for d in data:
                flatten(data[d], name + d + ',')
        elif type(data) is list:
            i = 0
            for l in data:
                flatten(l, name[:-1] + '_' + str(i) + ',')
                i += 1
        else:
            flattened.append((name[:-1] + ',' + data).split(','))

    flatten(dictionary)
    return flattened

list_obj=flatten_json(dict_obj)
pd.DataFrame(list_obj).set_index(list(range(len(list_obj[0])-1)))

Works fine for the example given and is less complicated even.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM