[英]How to flatten a nested JSON into a pandas dataframe
I have a bit of a tricky JSON
I want to put into a dataframe.我有一点棘手的
JSON
我想放入 dataframe。
{'A': {'name': 'A',
'left_foot': [{'toes': '5'}],
'right_foot': [{'toes': '4'}]},
'B': {'name': 'B',
'left_foot': [{'toes': '3'}],
'right_foot': [{'toes': '5'}]},
...
}
I don't need the first layer with A and B as it is part of name.我不需要带有 A 和 B 的第一层,因为它是名称的一部分。 There will always only be one left_foot and one right_foot.
永远只有一个 left_foot 和一个 right_foot。
The data I want is as follows:我想要的数据如下:
name left_foot.toes right_foot.toes
0 A 5 4
1 B 3 5
Using this post is was able to get the feet and toes but that is if you say data["A"].使用这篇文章能够得到脚和脚趾,但如果你说数据[“A”]。 Is there an easier way?
有没有更简单的方法?
EDIT I have something like this, but I need to specify "A"
in the first line.编辑我有这样的东西,但我需要在第一行指定
"A"
。
df = pd.json_normalize(tickers["A"]).pipe(
lambda x: x.drop('left_foot', 1).join(
x.left_foot.apply(lambda y: pd.Series(merge(y)))
)
).rename(columns={"toes": "left_foot.toes"}).pipe(
lambda x: x.drop('right_foot', 1).join(
x.right_foot.apply(lambda y: pd.Series(merge(y)))
)).rename(columns={"toes": "right_foot.toes"})
key
(eg 'A'
and 'B'
) is repeated as a value
in 'name'
, therefore it will be easier to use pandas.json_normalize
on only the values
of the dict
.key
(例如'A'
和'B'
)作为'name'
中的value
重复,因此仅对dict
的values
使用pandas.json_normalize
会更容易。'left_foot'
and 'right_foot'
columns need be exploded to remove each dict
from the list
'left_foot'
和'right_foot'
列以从list
中删除每个dict
dicts
to a dataframe and joins it back to df
dicts
并将其连接回df
apply pandas.Series
to just using pandas.DataFrame
to convert a column.apply pandas.Series
pandas.DataFrame
转换列。NaN
(eg missing dicts
or lists
) in the columns to be exploded and converted to a dataframe, see How to json_normalize a column with NaNsNaN
(例如缺少dicts
或lists
)而出现问题,请参阅如何使用 NaN 对列进行 json_normalizeimport pandas as pd
# test data
data = {'A': {'name': 'A', 'left_foot': [{'toes': '5'}], 'right_foot': [{'toes': '4'}]}, 'B': {'name': 'B', 'left_foot': [{'toes': '3'}], 'right_foot': [{'toes': '5'}]}, 'C': {'name': 'C', 'left_foot': [{'toes': '5'}], 'right_foot': [{'toes': '4'}]}, 'D': {'name': 'D', 'left_foot': [{'toes': '3'}], 'right_foot': [{'toes': '5'}]}}
# normalize data.values and explode the dicts out of the lists
df = pd.json_normalize(data.values()).apply(pd.Series.explode).reset_index(drop=True)
# display(df)
name left_foot right_foot
0 A {'toes': '5'} {'toes': '4'}
1 B {'toes': '3'} {'toes': '5'}
2 C {'toes': '5'} {'toes': '4'}
3 D {'toes': '3'} {'toes': '5'}
# extract the values from the dicts and create toe columns
df = df.join(pd.DataFrame(df.pop('left_foot').values.tolist())).rename(columns={'toes': 'lf_toes'})
df = df.join(pd.DataFrame(df.pop('right_foot').values.tolist())).rename(columns={'toes': 'rf_toes'})
# display(df)
name lf_toes rf_toes
0 A 5 4
1 B 3 5
2 C 5 4
3 D 3 5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.