简体   繁体   English

如何将嵌套的 JSON 展平为 pandas dataframe

[英]How to flatten a nested JSON into a pandas dataframe

I have a bit of a tricky JSON I want to put into a dataframe.我有一点棘手的JSON我想放入 dataframe。

{'A': {'name': 'A',
  'left_foot': [{'toes': '5'}],
  'right_foot': [{'toes': '4'}]},
 'B': {'name': 'B',
  'left_foot': [{'toes': '3'}],
  'right_foot': [{'toes': '5'}]},
...
}

I don't need the first layer with A and B as it is part of name.我不需要带有 A 和 B 的第一层,因为它是名称的一部分。 There will always only be one left_foot and one right_foot.永远只有一个 left_foot 和一个 right_foot。

The data I want is as follows:我想要的数据如下:

     name  left_foot.toes right_foot.toes
0       A           5           4
1       B           3           5

Using this post is was able to get the feet and toes but that is if you say data["A"].使用这篇文章能够得到脚和脚趾,但如果你说数据[“A”]。 Is there an easier way?有没有更简单的方法?

EDIT I have something like this, but I need to specify "A" in the first line.编辑我有这样的东西,但我需要在第一行指定"A"

df = pd.json_normalize(tickers["A"]).pipe(
    lambda x: x.drop('left_foot', 1).join(
        x.left_foot.apply(lambda y: pd.Series(merge(y)))
    )
).rename(columns={"toes": "left_foot.toes"}).pipe(
    lambda x: x.drop('right_foot', 1).join(
        x.right_foot.apply(lambda y: pd.Series(merge(y)))
    )).rename(columns={"toes": "right_foot.toes"})
  • Given your data, each top level key (eg 'A' and 'B' ) is repeated as a value in 'name' , therefore it will be easier to use pandas.json_normalize on only the values of the dict .给定您的数据,每个顶级key (例如'A''B' )作为'name'中的value重复,因此仅对dictvalues使用pandas.json_normalize会更容易。
  • The 'left_foot' and 'right_foot' columns need be exploded to remove each dict from the list需要分解'left_foot''right_foot'列以从list中删除每个dict
  • The final step converts the columns of dicts to a dataframe and joins it back to df最后一步将字典的列转换为dicts并将其连接回df
  • It's not necessarily less code, but this should be significantly faster than the multiple applies used in the current code.它不一定是更少的代码,但这应该比当前代码中使用的多个应用要快得多。
    • See this timing analysis comparing apply pandas.Series to just using pandas.DataFrame to convert a column.请参阅此时序分析,比较apply pandas.Series pandas.DataFrame转换列。
  • If there are issues because your dataframe has NaN (eg missing dicts or lists ) in the columns to be exploded and converted to a dataframe, see How to json_normalize a column with NaNs如果由于您的 dataframe 在要分解并转换为 Z6A8064B5DF4794555500553C47C55057DZ 的列中有NaN (例如缺少dictslists )而出现问题,请参阅如何使用 NaN 对列进行 json_normalize
import pandas as pd

# test data
data = {'A': {'name': 'A', 'left_foot': [{'toes': '5'}], 'right_foot': [{'toes': '4'}]}, 'B': {'name': 'B', 'left_foot': [{'toes': '3'}], 'right_foot': [{'toes': '5'}]}, 'C': {'name': 'C', 'left_foot': [{'toes': '5'}], 'right_foot': [{'toes': '4'}]}, 'D': {'name': 'D', 'left_foot': [{'toes': '3'}], 'right_foot': [{'toes': '5'}]}}

# normalize data.values and explode the dicts out of the lists
df = pd.json_normalize(data.values()).apply(pd.Series.explode).reset_index(drop=True)

# display(df)
  name      left_foot     right_foot
0    A  {'toes': '5'}  {'toes': '4'}
1    B  {'toes': '3'}  {'toes': '5'}
2    C  {'toes': '5'}  {'toes': '4'}
3    D  {'toes': '3'}  {'toes': '5'}

# extract the values from the dicts and create toe columns
df = df.join(pd.DataFrame(df.pop('left_foot').values.tolist())).rename(columns={'toes': 'lf_toes'})
df = df.join(pd.DataFrame(df.pop('right_foot').values.tolist())).rename(columns={'toes': 'rf_toes'})

# display(df)
  name lf_toes rf_toes
0    A       5       4
1    B       3       5
2    C       5       4
3    D       3       5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM