如何将嵌套的 JSON 展平为 pandas dataframe

Question

I have a bit of a tricky JSON I want to put into a dataframe.我有一点棘手的JSON我想放入 dataframe。

{'A': {'name': 'A',
  'left_foot': [{'toes': '5'}],
  'right_foot': [{'toes': '4'}]},
 'B': {'name': 'B',
  'left_foot': [{'toes': '3'}],
  'right_foot': [{'toes': '5'}]},
...
}

I don't need the first layer with A and B as it is part of name.我不需要带有 A 和 B 的第一层，因为它是名称的一部分。 There will always only be one left_foot and one right_foot.永远只有一个 left_foot 和一个 right_foot。

The data I want is as follows:我想要的数据如下：

     name  left_foot.toes right_foot.toes
0       A           5           4
1       B           3           5

Using this post is was able to get the feet and toes but that is if you say data["A"].使用这篇文章能够得到脚和脚趾，但如果你说数据[“A”]。 Is there an easier way?有没有更简单的方法？

EDIT I have something like this, but I need to specify "A" in the first line.编辑我有这样的东西，但我需要在第一行指定"A" 。

df = pd.json_normalize(tickers["A"]).pipe(
    lambda x: x.drop('left_foot', 1).join(
        x.left_foot.apply(lambda y: pd.Series(merge(y)))
    )
).rename(columns={"toes": "left_foot.toes"}).pipe(
    lambda x: x.drop('right_foot', 1).join(
        x.right_foot.apply(lambda y: pd.Series(merge(y)))
    )).rename(columns={"toes": "right_foot.toes"})

Answer 1

Given your data, each top level key (eg 'A' and 'B' ) is repeated as a value in 'name' , therefore it will be easier to use pandas.json_normalize on only the values of the dict .给定您的数据，每个顶级key （例如'A'和'B' ）作为'name'中的value重复，因此仅对dict的values使用pandas.json_normalize会更容易。
The 'left_foot' and 'right_foot' columns need be exploded to remove each dict from the list需要分解'left_foot'和'right_foot'列以从list中删除每个dict
The final step converts the columns of dicts to a dataframe and joins it back to df最后一步将字典的列转换为dicts并将其连接回df
It's not necessarily less code, but this should be significantly faster than the multiple applies used in the current code.它不一定是更少的代码，但这应该比当前代码中使用的多个应用要快得多。
- See this timing analysis comparing apply pandas.Series to just using pandas.DataFrame to convert a column.请参阅此时序分析，比较apply pandas.Series pandas.DataFrame转换列。
If there are issues because your dataframe has NaN (eg missing dicts or lists ) in the columns to be exploded and converted to a dataframe, see How to json_normalize a column with NaNs如果由于您的 dataframe 在要分解并转换为 Z6A8064B5DF4794555500553C47C55057DZ 的列中有NaN （例如缺少dicts或lists ）而出现问题，请参阅如何使用 NaN 对列进行 json_normalize

import pandas as pd

# test data
data = {'A': {'name': 'A', 'left_foot': [{'toes': '5'}], 'right_foot': [{'toes': '4'}]}, 'B': {'name': 'B', 'left_foot': [{'toes': '3'}], 'right_foot': [{'toes': '5'}]}, 'C': {'name': 'C', 'left_foot': [{'toes': '5'}], 'right_foot': [{'toes': '4'}]}, 'D': {'name': 'D', 'left_foot': [{'toes': '3'}], 'right_foot': [{'toes': '5'}]}}

# normalize data.values and explode the dicts out of the lists
df = pd.json_normalize(data.values()).apply(pd.Series.explode).reset_index(drop=True)

# display(df)
  name      left_foot     right_foot
0    A  {'toes': '5'}  {'toes': '4'}
1    B  {'toes': '3'}  {'toes': '5'}
2    C  {'toes': '5'}  {'toes': '4'}
3    D  {'toes': '3'}  {'toes': '5'}

# extract the values from the dicts and create toe columns
df = df.join(pd.DataFrame(df.pop('left_foot').values.tolist())).rename(columns={'toes': 'lf_toes'})
df = df.join(pd.DataFrame(df.pop('right_foot').values.tolist())).rename(columns={'toes': 'rf_toes'})

# display(df)
  name lf_toes rf_toes
0    A       5       4
1    B       3       5
2    C       5       4
3    D       3       5

如何将嵌套的 JSON 展平为 pandas dataframe

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-01-23 03:27:43

如何将嵌套的 JSON 展平为 pandas dataframe

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-01-23 03:27:43

解决方案1
2 已采纳 2021-01-23 03:27:43