From normalized Pandas DataFrame to list of nested dicts

Question

Supposing I have obtained a normalized DataFrame starting from a list of nested dicts:

sample_list_of_dicts = [
    { 'group1': { 'item1': 'value1', 'item2': 'value2' } },
    { 'group1': { 'item1': 'value3', 'item2': 'value4' } }
]

df = pd.json_normalize(sample_list_of_dicts)

Is there a way to revert back to the list of nested dicts from the DataFrame df ?

Answer 1

One of the possible approaches is to indexing with unique group names, renaming and collapsing columns with further transformations.
Yet it's a bit lengthy solution (as for me), and I'd be glad to see if someone could achieve a shorter pandas way to same final result.

sample_list_of_dicts = [
    {'group1': {'item1': 'value1', 'item2': 'value2'}},
    {'group2': {'item1': 'value3', 'item2': 'value4'}}
]
df = pd.json_normalize(sample_list_of_dicts)

# set index with unique 'group' prefixes
df.set_index(df.columns.str.replace(r'\..*', '', regex=True).unique(), inplace=True)
# rename column names to those going after 'group<digit>.'
df.columns = df.columns.str.replace(r'group\d+\.', '', regex=True)
# collapse identical column names horizontally and transpose the df
df_dict = df.groupby(df.columns, axis=1).sum().T.to_dict()
# recompose final dict into a list of dicts
lst = list(map(dict, zip(df_dict.items())))

print(lst)

The output:

[{'group1': {'item1': 'value1', 'item2': 'value2'}},
 {'group2': {'item1': 'value3', 'item2': 'value4'}}]

This can be also chained in a single pipe:

df_dict = df.set_index(df.columns.str.replace(r'\..*', '', regex=True).unique())\
    .set_axis(df.columns.str.replace(r'group\d+\.', '', regex=True), axis=1)\
    .pipe(lambda df_: df_.groupby(df_.columns, axis=1).sum()).T.to_dict()
lst = list(map(dict, zip(df_dict.items())))

Answer 2

Here is a 2-line code to do this -

#Expand columns to multi-index
df.columns = df.columns.str.split('.', expand=True)

#Iterate the highest level and convert records to dict
output = [{k:j} for k in df.columns.levels[0] for j in df[k].to_dict('records')]
output

[{'group1': {'item1': 'value1', 'item2': 'value2'}},
 {'group1': {'item1': 'value3', 'item2': 'value4'}}]

Explanation

Converting the normalized json format (with columns of ab format) to multi-index will help filter the data frame for records belonging to different groups.
Once converted, you can iterate over the highest-level groups and fetch the records as jsons.
Then you can iterate over each record and add it as a value for the corresponding group as a dict in a list comprehension.

Answer 3

Another short solution using " expand (column names) -> stack (from columns to index) -> transpose " chain:

df_dict = df.set_axis(df.columns.str.split('.', expand=True), axis=1)\
    .stack(0).droplevel(0).T.to_dict()
lst = list(map(dict, zip(df_dict.items())))

The lst contents:

[{'group1': {'item1': 'value1', 'item2': 'value2'}},
 {'group2': {'item1': 'value3', 'item2': 'value4'}}]

From normalized Pandas DataFrame to list of nested dicts

Question

3 answers

solution1
1 2022-12-17 21:21:00

solution2
1 ACCPTED 2022-12-17 22:21:42

Explanation

solution3
0 2022-12-17 22:47:17

From normalized Pandas DataFrame to list of nested dicts

Question

3 answers

solution1 1 2022-12-17 21:21:00

solution2 1 ACCPTED 2022-12-17 22:21:42

Explanation

solution3 0 2022-12-17 22:47:17

solution1
1 2022-12-17 21:21:00

solution2
1 ACCPTED 2022-12-17 22:21:42

solution3
0 2022-12-17 22:47:17