簡體   English   中英

使用 Python Pandas 使用嵌套列表完全展平 JSON

[英]Completely Flatten JSON with nested list using Python Pandas

以下是 JSON 示例:

{
    "ApartmentBuilding":{
        "Address":{
            "HouseNumber": 5,
            "Street": "DataStreet",
            "ZipCode": 5100
        },
        "Apartments":[
            {
                "Number": 1,
                "Price": 500,
                "Residents": [
                    {
                        "Name": "Bob",
                        "Age": 43
                    },
                    {
                        "Name": "Alice",
                        "Age": 42
                    }
                ]
            },
            {
                "Number": 2,
                "Price": 750,
                "Residents": [
                    {
                        "Name": "Jane",
                        "Age": 43
                    },
                    {
                        "Name": "William",
                        "Age": 42
                    }
                ]
            },
            {
                "Number": 3,
                "Price": 1000,
                "Residents": []
            }
        ]
    }
}

我使用了以下函數: Python Pandas - Flatten Nested JSON

import json
import pandas as pd

def flatten_json(nested_json: dict, exclude: list=['']) -> dict:
    """
    Flatten a list of nested dicts.
    """
    out = dict()
    def flatten(x: (list, dict, str), name: str='', exclude=exclude):
        if type(x) is dict:
            for a in x:
                if a not in exclude:
                    flatten(x[a], f'{name}{a}_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, f'{name}{i}_')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(nested_json)
    return out

with open("apartments.json") as f:
    data = json.load(f)

print(data)

df = pd.DataFrame([flatten_json(x) for x in data['ApartmentBuilding']])

print(df)

我的目標是將 JSON 完全展平以轉換為 Panda Dataframe,但是得到一個奇怪的輸出,如下所示:

0     Address
1  Apartments

我基本上是在這樣的扁平化之后:

在此處輸入圖像描述

這是使用 Pandas json_normalizeexplode的另一種方法:

import json

import pandas as pd

with open("file.json") as f:
    data = json.load(f)

df = pd.json_normalize(data["ApartmentBuilding"])

首先,獲取並展平數據框的不同部分:

building = df.explode("Apartments").reset_index(drop=True)

apartments = pd.DataFrame(building["Apartments"].to_dict()).T.explode("Residents")

residents = pd.DataFrame(
    apartments["Residents"].dropna().reset_index(drop=True).to_list()
)

然后,通過合並和連接扁平部分來構建數據框:

new_df = pd.merge(
    left=building.loc[:, building.columns != "Apartments"],
    right=apartments.loc[:, apartments.columns != "Residents"],
    right_index=True,
    left_index=True,
).reset_index(drop=True)

new_df = pd.concat([new_df, residents], axis=1)

做一點清理:

new_df.columns = [col.replace("Address.", "") for col in new_df.columns]

最后:

print(new_df)
# Output
   HouseNumber      Street  ZipCode Number Price     Name   Age
0            5  DataStreet     5100      1   500      Bob  43.0
1            5  DataStreet     5100      1   500    Alice  42.0
2            5  DataStreet     5100      2   750     Jane  43.0
3            5  DataStreet     5100      2   750  William  42.0
4            5  DataStreet     5100      3  1000      NaN   NaN

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM