[英]Completely Flatten JSON with nested list using Python Pandas
以下是 JSON 示例:
{
"ApartmentBuilding":{
"Address":{
"HouseNumber": 5,
"Street": "DataStreet",
"ZipCode": 5100
},
"Apartments":[
{
"Number": 1,
"Price": 500,
"Residents": [
{
"Name": "Bob",
"Age": 43
},
{
"Name": "Alice",
"Age": 42
}
]
},
{
"Number": 2,
"Price": 750,
"Residents": [
{
"Name": "Jane",
"Age": 43
},
{
"Name": "William",
"Age": 42
}
]
},
{
"Number": 3,
"Price": 1000,
"Residents": []
}
]
}
}
我使用了以下函數: Python Pandas - Flatten Nested JSON
import json
import pandas as pd
def flatten_json(nested_json: dict, exclude: list=['']) -> dict:
"""
Flatten a list of nested dicts.
"""
out = dict()
def flatten(x: (list, dict, str), name: str='', exclude=exclude):
if type(x) is dict:
for a in x:
if a not in exclude:
flatten(x[a], f'{name}{a}_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, f'{name}{i}_')
i += 1
else:
out[name[:-1]] = x
flatten(nested_json)
return out
with open("apartments.json") as f:
data = json.load(f)
print(data)
df = pd.DataFrame([flatten_json(x) for x in data['ApartmentBuilding']])
print(df)
我的目標是將 JSON 完全展平以轉換為 Panda Dataframe,但是得到一個奇怪的輸出,如下所示:
0 Address
1 Apartments
我基本上是在這樣的扁平化之后:
這是使用 Pandas json_normalize和explode的另一種方法:
import json
import pandas as pd
with open("file.json") as f:
data = json.load(f)
df = pd.json_normalize(data["ApartmentBuilding"])
首先,獲取並展平數據框的不同部分:
building = df.explode("Apartments").reset_index(drop=True)
apartments = pd.DataFrame(building["Apartments"].to_dict()).T.explode("Residents")
residents = pd.DataFrame(
apartments["Residents"].dropna().reset_index(drop=True).to_list()
)
然后,通過合並和連接扁平部分來構建數據框:
new_df = pd.merge(
left=building.loc[:, building.columns != "Apartments"],
right=apartments.loc[:, apartments.columns != "Residents"],
right_index=True,
left_index=True,
).reset_index(drop=True)
new_df = pd.concat([new_df, residents], axis=1)
做一點清理:
new_df.columns = [col.replace("Address.", "") for col in new_df.columns]
最后:
print(new_df)
# Output
HouseNumber Street ZipCode Number Price Name Age
0 5 DataStreet 5100 1 500 Bob 43.0
1 5 DataStreet 5100 1 500 Alice 42.0
2 5 DataStreet 5100 2 750 Jane 43.0
3 5 DataStreet 5100 2 750 William 42.0
4 5 DataStreet 5100 3 1000 NaN NaN
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.