![](/img/trans.png)
[英]How to turn Nested JSON into Pandas Data Frame using Python?
[英]Python Pandas: How to extract data from JSON, and turn it into data frame?
我有 JSON 數據,例如
{'Author': [{'name': 'John', 'Agency': {'Marketing': [{'name': 'SD_SM_14'}], 'Media': [{'codeX': 's_wse@2'}]}}]}
我想提取三列(作者、營銷和媒體)並將其轉換為如下數據:
Author Marketing Media
John SD_SM_14 s_wse@2
感謝您提前提供任何幫助!
也許您應該明確展平您發布的嵌套 JSON 數據。
JSON 結構:
{
"Author": [
{
"name": "John",
"Agency": {
"Marketing": [
{
"name": "SD_SM_14"
}
],
"Media": [
{
"codeX": "s_wse@2"
}
]
}
}
]
}
你想要什么:
Author Marketing Media
John SD_SM_14 s_wse@2
這是代碼:
import pandas as pd
from typing import Dict
def flatten(data: Dict):
for key, value in data.items():
for res in value:
# assume that there is only one key in `res`
yield key, next(iter(res.values()))
def func(data: Dict):
for author in data['Author']:
name = author['name']
agency = author['Agency']
yield dict([('Author', name)] + list(flatten(agency)))
df = pd.DataFrame(func(data))
我找不到更好的方法,但是一旦解決方案可以首先從作者中提取名稱列並分解列表,以便在再次使用json_normalize提取所需列時擁有 json:
In [38]: dic = {'Author': [{'name': 'John', 'Agency': {'Marketing': [{'name': 'SD_SM_14'}], 'Media': [{'codeX': 's_wse@
...: 2'}]}}]}
In [39]: df = pd.DataFrame(dic)
In [40]: df
Out[40]:
Author
0 {'name': 'John', 'Agency': {'Marketing': [{'na...
In [41]: df = pd.json_normalize(df.Author)
In [42]: df
Out[42]:
name Agency.Marketing Agency.Media
0 John [{'name': 'SD_SM_14'}] [{'codeX': 's_wse@2'}]
In [43]: df1 = df.explode('Agency.Marketing')
In [44]: df1
Out[44]:
name Agency.Marketing Agency.Media
0 John {'name': 'SD_SM_14'} [{'codeX': 's_wse@2'}]
In [45]: df1 = df1.explode('Agency.Media')
In [47]: df2 = pd.json_normalize(df1['Agency.Marketing'])
In [48]: df2
Out[48]:
name
0 SD_SM_14
In [49]: df3 = pd.json_normalize(df1['Agency.Media'])
In [50]: df3
Out[50]:
codeX
0 s_wse@2
In [51]: main_df = pd.concat([df1,df2,df3], axis=1)
In [52]: main_df
Out[52]:
name Agency.Marketing Agency.Media name codeX
0 John {'name': 'SD_SM_14'} {'codeX': 's_wse@2'} SD_SM_14 s_wse@2
In [53]: main_df.drop(['Agency.Marketing','Agency.Media'],inplace=True,axis=1)
In [54]: main_df
Out[54]:
name name codeX
0 John SD_SM_14 s_wse@2
更新:
如果您已導入 json_normalize 方法:
只需使用json_normalize
而不是pd.json_normalize
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.