I have JSON data like
{'Author': [{'name': 'John', 'Agency': {'Marketing': [{'name': 'SD_SM_14'}], 'Media': [{'codeX': 's_wse@2'}]}}]}
I would like to extract three columns (Author, Marketing, and Media) and turn it into data like below:
Author Marketing Media
John SD_SM_14 s_wse@2
Thank you for any help in advance!
Maybe you should explicitly flatten the nested JSON data you post.
JSON Structure:
{
"Author": [
{
"name": "John",
"Agency": {
"Marketing": [
{
"name": "SD_SM_14"
}
],
"Media": [
{
"codeX": "s_wse@2"
}
]
}
}
]
}
What you want:
Author Marketing Media
John SD_SM_14 s_wse@2
Here is the code:
import pandas as pd
from typing import Dict
def flatten(data: Dict):
for key, value in data.items():
for res in value:
# assume that there is only one key in `res`
yield key, next(iter(res.values()))
def func(data: Dict):
for author in data['Author']:
name = author['name']
agency = author['Agency']
yield dict([('Author', name)] + list(flatten(agency)))
df = pd.DataFrame(func(data))
I could not find a better way, but once solution can be to first extract name column from the Author and the explode the lists so that you have json when again you use json_normalize to extract the required columns:
In [38]: dic = {'Author': [{'name': 'John', 'Agency': {'Marketing': [{'name': 'SD_SM_14'}], 'Media': [{'codeX': 's_wse@
...: 2'}]}}]}
In [39]: df = pd.DataFrame(dic)
In [40]: df
Out[40]:
Author
0 {'name': 'John', 'Agency': {'Marketing': [{'na...
In [41]: df = pd.json_normalize(df.Author)
In [42]: df
Out[42]:
name Agency.Marketing Agency.Media
0 John [{'name': 'SD_SM_14'}] [{'codeX': 's_wse@2'}]
In [43]: df1 = df.explode('Agency.Marketing')
In [44]: df1
Out[44]:
name Agency.Marketing Agency.Media
0 John {'name': 'SD_SM_14'} [{'codeX': 's_wse@2'}]
In [45]: df1 = df1.explode('Agency.Media')
In [47]: df2 = pd.json_normalize(df1['Agency.Marketing'])
In [48]: df2
Out[48]:
name
0 SD_SM_14
In [49]: df3 = pd.json_normalize(df1['Agency.Media'])
In [50]: df3
Out[50]:
codeX
0 s_wse@2
In [51]: main_df = pd.concat([df1,df2,df3], axis=1)
In [52]: main_df
Out[52]:
name Agency.Marketing Agency.Media name codeX
0 John {'name': 'SD_SM_14'} {'codeX': 's_wse@2'} SD_SM_14 s_wse@2
In [53]: main_df.drop(['Agency.Marketing','Agency.Media'],inplace=True,axis=1)
In [54]: main_df
Out[54]:
name name codeX
0 John SD_SM_14 s_wse@2
Update :
if you have imported the json_normalize method:
just use json_normalize
instead of pd.json_normalize
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.