简体   繁体   中英

From normalized Pandas DataFrame to list of nested dicts

Supposing I have obtained a normalized DataFrame starting from a list of nested dicts:

sample_list_of_dicts = [
    { 'group1': { 'item1': 'value1', 'item2': 'value2' } },
    { 'group1': { 'item1': 'value3', 'item2': 'value4' } }
]

df = pd.json_normalize(sample_list_of_dicts)

Is there a way to revert back to the list of nested dicts from the DataFrame df ?

One of the possible approaches is to indexing with unique group names, renaming and collapsing columns with further transformations.
Yet it's a bit lengthy solution (as for me), and I'd be glad to see if someone could achieve a shorter pandas way to same final result.

sample_list_of_dicts = [
    {'group1': {'item1': 'value1', 'item2': 'value2'}},
    {'group2': {'item1': 'value3', 'item2': 'value4'}}
]
df = pd.json_normalize(sample_list_of_dicts)

# set index with unique 'group' prefixes
df.set_index(df.columns.str.replace(r'\..*', '', regex=True).unique(), inplace=True)
# rename column names to those going after 'group<digit>.'
df.columns = df.columns.str.replace(r'group\d+\.', '', regex=True)
# collapse identical column names horizontally and transpose the df
df_dict = df.groupby(df.columns, axis=1).sum().T.to_dict()
# recompose final dict into a list of dicts
lst = list(map(dict, zip(df_dict.items())))

print(lst)

The output:

[{'group1': {'item1': 'value1', 'item2': 'value2'}},
 {'group2': {'item1': 'value3', 'item2': 'value4'}}]

This can be also chained in a single pipe:

df_dict = df.set_index(df.columns.str.replace(r'\..*', '', regex=True).unique())\
    .set_axis(df.columns.str.replace(r'group\d+\.', '', regex=True), axis=1)\
    .pipe(lambda df_: df_.groupby(df_.columns, axis=1).sum()).T.to_dict()
lst = list(map(dict, zip(df_dict.items())))

Here is a 2-line code to do this -

#Expand columns to multi-index
df.columns = df.columns.str.split('.', expand=True)

#Iterate the highest level and convert records to dict
output = [{k:j} for k in df.columns.levels[0] for j in df[k].to_dict('records')]
output
[{'group1': {'item1': 'value1', 'item2': 'value2'}},
 {'group1': {'item1': 'value3', 'item2': 'value4'}}]

Explanation

  1. Converting the normalized json format (with columns of ab format) to multi-index will help filter the data frame for records belonging to different groups.
  2. Once converted, you can iterate over the highest-level groups and fetch the records as jsons.
  3. Then you can iterate over each record and add it as a value for the corresponding group as a dict in a list comprehension.

Another short solution using " expand (column names) -> stack (from columns to index) -> transpose " chain:

df_dict = df.set_axis(df.columns.str.split('.', expand=True), axis=1)\
    .stack(0).droplevel(0).T.to_dict()
lst = list(map(dict, zip(df_dict.items())))

The lst contents:

[{'group1': {'item1': 'value1', 'item2': 'value2'}},
 {'group2': {'item1': 'value3', 'item2': 'value4'}}]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM