Supposing I have obtained a normalized DataFrame starting from a list of nested dicts:
sample_list_of_dicts = [
{ 'group1': { 'item1': 'value1', 'item2': 'value2' } },
{ 'group1': { 'item1': 'value3', 'item2': 'value4' } }
]
df = pd.json_normalize(sample_list_of_dicts)
Is there a way to revert back to the list of nested dicts from the DataFrame df
?
One of the possible approaches is to indexing with unique group names, renaming and collapsing columns with further transformations.
Yet it's a bit lengthy solution (as for me), and I'd be glad to see if someone could achieve a shorter pandas way to same final result.
sample_list_of_dicts = [
{'group1': {'item1': 'value1', 'item2': 'value2'}},
{'group2': {'item1': 'value3', 'item2': 'value4'}}
]
df = pd.json_normalize(sample_list_of_dicts)
# set index with unique 'group' prefixes
df.set_index(df.columns.str.replace(r'\..*', '', regex=True).unique(), inplace=True)
# rename column names to those going after 'group<digit>.'
df.columns = df.columns.str.replace(r'group\d+\.', '', regex=True)
# collapse identical column names horizontally and transpose the df
df_dict = df.groupby(df.columns, axis=1).sum().T.to_dict()
# recompose final dict into a list of dicts
lst = list(map(dict, zip(df_dict.items())))
print(lst)
The output:
[{'group1': {'item1': 'value1', 'item2': 'value2'}},
{'group2': {'item1': 'value3', 'item2': 'value4'}}]
This can be also chained in a single pipe:
df_dict = df.set_index(df.columns.str.replace(r'\..*', '', regex=True).unique())\
.set_axis(df.columns.str.replace(r'group\d+\.', '', regex=True), axis=1)\
.pipe(lambda df_: df_.groupby(df_.columns, axis=1).sum()).T.to_dict()
lst = list(map(dict, zip(df_dict.items())))
Here is a 2-line code to do this -
#Expand columns to multi-index
df.columns = df.columns.str.split('.', expand=True)
#Iterate the highest level and convert records to dict
output = [{k:j} for k in df.columns.levels[0] for j in df[k].to_dict('records')]
output
[{'group1': {'item1': 'value1', 'item2': 'value2'}},
{'group1': {'item1': 'value3', 'item2': 'value4'}}]
Another short solution using " expand (column names) -> stack (from columns to index) -> transpose " chain:
df_dict = df.set_axis(df.columns.str.split('.', expand=True), axis=1)\
.stack(0).droplevel(0).T.to_dict()
lst = list(map(dict, zip(df_dict.items())))
The lst
contents:
[{'group1': {'item1': 'value1', 'item2': 'value2'}},
{'group2': {'item1': 'value3', 'item2': 'value4'}}]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.