简体   繁体   中英

Extract nested JSON in pandas dataframe

I am trying to unpack nested JSON in the following pandas dataframe:

           id                                                              info
0           0  [{u'a': u'good', u'b': u'type1'}, {u'a': u'bad', u'b': u'type2'}]
1           1  [{u'a': u'bad', u'b': u'type1'}, {u'a': u'bad', u'b': u'type2'}]
2           2  [{u'a': u'good', u'b': u'type1'}, {u'a': u'good', u'b': u'type2'}]

My expected outcome is:

           id        type1    type2
0           0        good     bad
1           1        bad      bad
2           2        good     good

I've been looking at other solutions including json_normalize but it does not work for me unfortunately. Should I treat the JSON as a string to get what I want? Or is there a more straight forward way to do this?

  1. Use json_normalize to handle a list of dictionaries and break individual dicts into separate series after setting the common path, which is info here. Then, unstack + apply series which gets appended downwards for that level.

from pandas.io.json import json_normalize

df_info = json_normalize(df.to_dict('list'), ['info']).unstack().apply(pd.Series)
df_info

在此输入图像描述

  1. Pivot the DF with an optional aggfunc to handle duplicated index axis:

DF = df_info.pivot_table(index=df_info.index.get_level_values(1), columns=['b'], 
                         values=['a'], aggfunc=' '.join)

DF

在此输入图像描述

  1. Finally Concatenate sideways:

pd.concat([df[['ID']], DF.xs('a', axis=1).rename_axis(None, 1)], axis=1)

在此输入图像描述


Starting DF used:

df = pd.DataFrame(dict(ID=[0,1,2], info=[[{u'a': u'good', u'b': u'type1'}, {u'a': u'bad', u'b': u'type2'}], 
                                        [{u'a': u'bad', u'b': u'type1'}, {u'a': u'bad', u'b': u'type2'}],
                                        [{u'a': u'good', u'b': u'type1'}, {u'a': u'good', u'b': u'type2'}]]))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM