使用 Python Pandas 使用嵌套列表完全展平 JSON

Question

以下是 JSON 示例：

{
    "ApartmentBuilding":{
        "Address":{
            "HouseNumber": 5,
            "Street": "DataStreet",
            "ZipCode": 5100
        },
        "Apartments":[
            {
                "Number": 1,
                "Price": 500,
                "Residents": [
                    {
                        "Name": "Bob",
                        "Age": 43
                    },
                    {
                        "Name": "Alice",
                        "Age": 42
                    }
                ]
            },
            {
                "Number": 2,
                "Price": 750,
                "Residents": [
                    {
                        "Name": "Jane",
                        "Age": 43
                    },
                    {
                        "Name": "William",
                        "Age": 42
                    }
                ]
            },
            {
                "Number": 3,
                "Price": 1000,
                "Residents": []
            }
        ]
    }
}

我使用了以下函數： Python Pandas - Flatten Nested JSON

import json
import pandas as pd

def flatten_json(nested_json: dict, exclude: list=['']) -> dict:
    """
    Flatten a list of nested dicts.
    """
    out = dict()
    def flatten(x: (list, dict, str), name: str='', exclude=exclude):
        if type(x) is dict:
            for a in x:
                if a not in exclude:
                    flatten(x[a], f'{name}{a}_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, f'{name}{i}_')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(nested_json)
    return out

with open("apartments.json") as f:
    data = json.load(f)

print(data)

df = pd.DataFrame([flatten_json(x) for x in data['ApartmentBuilding']])

print(df)

我的目標是將 JSON 完全展平以轉換為 Panda Dataframe，但是得到一個奇怪的輸出，如下所示：

0     Address
1  Apartments

我基本上是在這樣的扁平化之后：

Answer 1

這是使用 Pandas json_normalize和explode的另一種方法：

import json

import pandas as pd

with open("file.json") as f:
    data = json.load(f)

df = pd.json_normalize(data["ApartmentBuilding"])

首先，獲取並展平數據框的不同部分：

building = df.explode("Apartments").reset_index(drop=True)

apartments = pd.DataFrame(building["Apartments"].to_dict()).T.explode("Residents")

residents = pd.DataFrame(
    apartments["Residents"].dropna().reset_index(drop=True).to_list()
)

然后，通過合並和連接扁平部分來構建數據框：

new_df = pd.merge(
    left=building.loc[:, building.columns != "Apartments"],
    right=apartments.loc[:, apartments.columns != "Residents"],
    right_index=True,
    left_index=True,
).reset_index(drop=True)

new_df = pd.concat([new_df, residents], axis=1)

做一點清理：

new_df.columns = [col.replace("Address.", "") for col in new_df.columns]

最后：

print(new_df)
# Output
   HouseNumber      Street  ZipCode Number Price     Name   Age
0            5  DataStreet     5100      1   500      Bob  43.0
1            5  DataStreet     5100      1   500    Alice  42.0
2            5  DataStreet     5100      2   750     Jane  43.0
3            5  DataStreet     5100      2   750  William  42.0
4            5  DataStreet     5100      3  1000      NaN   NaN

使用 Python Pandas 使用嵌套列表完全展平 JSON

問題描述

1 個解決方案

解決方案1
1 已采納 2022-07-16 18:00:55

使用 Python Pandas 使用嵌套列表完全展平 JSON

問題描述

1 個解決方案

解決方案1 1 已采納 2022-07-16 18:00:55

解決方案1
1 已采納 2022-07-16 18:00:55