Convert deeply nested JSON to Pandas Dataframe with multi-index hierarchical index based on the JSON

Question

I am seeking to convert a deeply nested json string into a Pandas Dataframe with hierarchical multiindex based on the hierarchy within the json.

I don't know the structure of the json and this will be used on a list of jsons with will likely have different hierarchies so this needs to be dynamic and cannot be hardcoded.

For the purpose of this question I'm using the following the "Example 2" data from the following link (though the actual data is much larger and more deeply nested): https://support.oneskyapp.com/hc/en-us/articles/208047697-JSON-sample-files

I am using the following code to interatively flatten the json:

import itertools as it
import pandas as pd

def flatten_json (dictionary):

    def unpack(parent_key,parent_value):
        if isinstance (parent_value,dict):
            for key, value in parent_value.items():
                temp = parent_key + "-" + key
                print("parent_key: ", parent_key," key: ", key)
                yield temp, value
        elif isinstance(parent_value, list):
            i = 0
            for value in parent_value:
                temp2 = parent_key + "_" + str(i)
                i += 1
                yield temp2, value
        else:
            yield parent_key, parent_value
    while True:
        dictionary = dict(it.chain.from_iterable(it.starmap(unpack,dictionary.items())))

        if not any(isinstance(value,dict) for value in dictionary.values()) and not any(isinstance(value,list) for value in dictionary.values()):
            break

    return dictionary

data = ***json data from link above as a string - omitting due to length***
output_table = pd.Series(flatten_json(data)).to_frame()

The produces the following output:

I want to produce the following output:

Answer 1

Found a possible solution with some playing around from the dictionary output from the original flatten_json() function. Not sure how efficient this is but it seems to work:

dictionary = flatten_json(data2)
all_values = list(dictionary.values())

index_list = []
for key in dictionary:
    x = tuple(key.split("-"))
    index_list.append(x)

index = pd.MultiIndex.from_tuples(index_list)

df = pd.Series(all_values, index=index).to_frame()

Any suggestions or comments welcome...

Answer 2

You can also flatten out the dictionary into a list of lists, make a DataFrame out of it, and then set the index to get the required output:

def flatten_json(dictionary):
    flattened = []

    def flatten(data, name=''):
        if type(data) is dict:
            for d in data:
                flatten(data[d], name + d + ',')
        elif type(data) is list:
            i = 0
            for l in data:
                flatten(l, name[:-1] + '_' + str(i) + ',')
                i += 1
        else:
            flattened.append((name[:-1] + ',' + data).split(','))

    flatten(dictionary)
    return flattened

list_obj=flatten_json(dict_obj)
pd.DataFrame(list_obj).set_index(list(range(len(list_obj[0])-1)))

Works fine for the example given and is less complicated even.

Convert deeply nested JSON to Pandas Dataframe with multi-index hierarchical index based on the JSON

Question

2 answers

solution1
0 ACCPTED 2020-03-01 16:20:21

solution2
0 2020-03-01 17:19:17

Convert deeply nested JSON to Pandas Dataframe with multi-index hierarchical index based on the JSON

Question

2 answers

solution1 0 ACCPTED 2020-03-01 16:20:21

solution2 0 2020-03-01 17:19:17

solution1
0 ACCPTED 2020-03-01 16:20:21

solution2
0 2020-03-01 17:19:17