[英]Unpacking Json with nested Lists in Pandas
I have a json file that I am trying to unpack that looks like this:我有一个 json 文件,我试图解压缩它,如下所示:
[{'batter': 'LA Marsh',
'bowler': 'MJG Nielsen',
'non_striker': 'M Kapp',
'runs': {'batter': 0, 'extras': 0, 'total': 0}},
{'batter': 'LA Marsh',
'bowler': 'MJG Nielsen',
'non_striker': 'M Kapp',
'runs': {'batter': 0, 'extras': 0, 'total': 0},
'wickets': [{'player_out': 'LA Marsh', 'kind': 'bowled'}]},
{'batter': 'EA Perry',
'bowler': 'MJG Nielsen',
'non_striker': 'M Kapp',
'runs': {'batter': 0, 'extras': 0, 'total': 0}}]
using the following code:使用以下代码:
df = pd.json_normalize(data)
I get the following:我得到以下信息:
As you can see, the second entry has a nested list in it.如您所见,第二个条目中有一个嵌套列表。 In place of the column 'wickets' I would like to have two columns "player_out" and "kind".
我想要两列“player_out”和“kind”来代替“wickets”列。 My preferred output looks like this:
我首选的 output 看起来像这样:
Use:利用:
df = df.drop(columns=['wickets']).join(df['wickets'].explode().apply(pd.Series))
You can try:你可以试试:
import pandas as pd
from collections import MutableMapping
def flatten(d, parent_key='', sep='.'):
items = []
for k, v in d.items():
new_key = parent_key + sep + k if parent_key else k
if isinstance(v, MutableMapping):
items.extend(flatten(v, new_key, sep=sep).items())
elif isinstance(v, list):
for idx, value in enumerate(v):
items.extend(flatten(value, new_key, sep).items())
else:
items.append((new_key, v))
return dict(items)
data = [{'batter': 'LA Marsh',
'bowler': 'MJG Nielsen',
'non_striker': 'M Kapp',
'runs': {'batter': 0, 'extras': 0, 'total': 0}},
{'batter': 'LA Marsh',
'bowler': 'MJG Nielsen',
'non_striker': 'M Kapp',
'runs': {'batter': 0, 'extras': 0, 'total': 0},
'wickets': [{'player_out': 'LA Marsh', 'kind': 'bowled'}]},
{'batter': 'EA Perry',
'bowler': 'MJG Nielsen',
'non_striker': 'M Kapp',
'runs': {'batter': 0, 'extras': 0, 'total': 0}}]
output = []
for dict_data in data:
output.append(flatten(dict_data))
df = pd.DataFrame(output)
print(df)
Output: Output:
batter bowler non_striker runs.batter runs.extras runs.total wickets.player_out wickets.kind
0 LA Marsh MJG Nielsen M Kapp 0 0 0 NaN NaN
1 LA Marsh MJG Nielsen M Kapp 0 0 0 LA Marsh bowled
2 EA Perry MJG Nielsen M Kapp 0 0 0 NaN NaN
if you want to keep using json normalize you need to fisrt homogenize the data如果你想继续使用 json 规范化你需要首先同质化数据
the apply json normalize应用 json 规范化
nan_entries = [{'player_out': pd.NA, 'kind': pd.NA}]
data = [{'batter': 'LA Marsh',
'bowler': 'MJG Nielsen',
'non_striker': 'M Kapp',
'runs': {'batter': 0, 'extras': 0, 'total': 0}},
{'batter': 'LA Marsh',
'bowler': 'MJG Nielsen',
'non_striker': 'M Kapp',
'runs': {'batter': 0, 'extras': 0, 'total': 0},
'wickets': [{'player_out': 'LA Marsh', 'kind': 'bowled'}]},
{'batter': 'EA Perry',
'bowler': 'MJG Nielsen',
'non_striker': 'M Kapp',
'runs': {'batter': 0, 'extras': 0, 'total': 0}}]
# homogenize data
nan_entries = [{'player_out': pd.NA, 'kind': pd.NA}]
for entry in data:
if 'wickets' not in entry.keys():
entry['wickets'] = nan_entries
# use json normailze
pd.json_normalize(data,
record_path='wickets',
meta=['batter', 'bowler', 'non_striker', ['runs', 'batter'],
['runs', 'extras'], ['runs', 'total'] ],
record_prefix='wickets.')
output output
wickets.player_out wickets.kind batter bowler non_striker runs.batter runs.extras runs.total
0 <NA> <NA> LA Marsh MJG Nielsen M Kapp 0 0 0
1 LA Marsh bowled LA Marsh MJG Nielsen M Kapp 0 0 0
2 <NA> <NA> EA Perry MJG Nielsen M Kapp 0 0 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.