[英]How to flatten a list of dictionaries with non-constant attributes
直到最近,我還使用下面相對簡單的代碼從 API 獲取位置數據,展平響應並轉換為展平數據框/表格。 它工作得很好,因為鍵“ExtendedAttributes”返回一組恆定的值並以恆定的順序。 API 架構在一夜之間發生了變化(沒有警告),現在根據每個位置的屬性返回不同的值。
將嵌套的字典列表展平為表格並重命名列的更簡單的解決方案不再有效。 我不確定如何處理這樣的案例或首先嘗試什么方法。
腳本
# Fetch the initial location description list
locationDescriptions = timeseries.publish.get('/GetLocationDescriptionList')['LocationDescriptions']
#Loop through provisioning API to get full location info for every UniqueID
locations = timeseries.provisioning.send_batch_requests('/locations/{Id}', [{'LocationUniqueId': loc['UniqueId']} for loc in locationDescriptions])
#API response in 'locations' is nested JSON so we need to unpack/flatten it.
dic_flattened = [flatten(d) for d in locations]
df_flat =pd.DataFrame(dic_flattened)
#give value columns matching names
df_flat.rename(columns = {'ExtendedAttributeValues_0_Value' : 'COUNTY' ...}, inplace = true)
第一個位置
"ExtendedAttributeValues": [
{
"ColumnIdentifier": "COUNTY@LOCATION_EXTENSION",
"Value": "Okaloosa - FL",
"UniqueId": "538e05b45a9a4b31a46cf96c4ffab8cb"
},
{
"ColumnIdentifier": "GW_REGION@LOCATION_EXTENSION",
"Value": "Western Panhandle Embayment Region",
"UniqueId": "5f51ebde984c4bdd92dff067cbe5b39b"
},
{
"ColumnIdentifier": "LAND_NET@LOCATION_EXTENSION",
"Value": "S016T3NR22W",
"UniqueId": "8d8139c9027a497f9cae4ef7471930ba"
}
第二個位置(屬性不再匹配)
"ExtendedAttributeValues": [
{
"ColumnIdentifier": "DATA_USED@GW_EXTENSION",
"Value": "",
"UniqueId": "dace52af725b42a9aa63aa8e1b9a1b74"
},
{
"ColumnIdentifier": "TOP_BUCATUNA@GW_EXTENSION",
"Value": "",
"UniqueId": "352e5763d90748a490b32ba833a65d1c"
},
{
"ColumnIdentifier": "TOP_FLORIDAN@GW_EXTENSION",
"Value": "",
"UniqueId": "b940292e63e84214ab785584f420674b"
}
展平現在會產生如下表格:
ExtendedAttributeValues_0_Value | ExtendedAttributeValues_1_ColumnIdentifier | ExtendedAttributeValues_1_Value | ExtendedAttributeValues_2_ColumnIdentifier | ExtendedAttributeValues_2_Value |
---|---|---|---|---|
COUNTY@LOCATION_EXTENSION | 奧卡盧薩 - 佛羅里達州 | GW_REGION@LOCATION_EXTENSION | 西部狹長海灣地區 | LAND_NET@LOCATION_EXTENSION |
DATA_USED@GW_EXTENSION | TOP_BUCATUNA@GW_EXTENSION | TOP_FLORIDAN@GW_EXTENSION |
但我想將每個“ColumnIdentifier”轉換為列名,並用關聯的“Value”填充該列的行:
DATA_USED | GW_REGION | TOP_BUCATUNA | LAND_NET | TOP_佛羅里達 |
---|---|---|---|---|
奧卡盧薩 - 佛羅里達州 | 西部狹長海灣地區 | S016T3NR22W |
data = {
"ExtendedAttributeValues": [
{
"ColumnIdentifier": "COUNTY@LOCATION_EXTENSION",
"Value": "Okaloosa - FL",
"UniqueId": "538e05b45a9a4b31a46cf96c4ffab8cb"
},
{
"ColumnIdentifier": "GW_REGION@LOCATION_EXTENSION",
"Value": "Western Panhandle Embayment Region",
"UniqueId": "5f51ebde984c4bdd92dff067cbe5b39b"
},
{
"ColumnIdentifier": "LAND_NET@LOCATION_EXTENSION",
"Value": "S016T3NR22W",
"UniqueId": "8d8139c9027a497f9cae4ef7471930ba"
}
]
}
data2 = {
"ExtendedAttributeValues": [
{
"ColumnIdentifier": "DATA_USED@GW_EXTENSION",
"Value": "",
"UniqueId": "dace52af725b42a9aa63aa8e1b9a1b74"
},
{
"ColumnIdentifier": "TOP_BUCATUNA@GW_EXTENSION",
"Value": "",
"UniqueId": "352e5763d90748a490b32ba833a65d1c"
},
{
"ColumnIdentifier": "TOP_FLORIDAN@GW_EXTENSION",
"Value": "",
"UniqueId": "b940292e63e84214ab785584f420674b"
}
]
}
給定任一字典,我們可以簡單地使用pd.json_normalize
讀取它們:
df1 = pd.json_normalize(data, 'ExtendedAttributeValues')
df2 = pd.json_normalize(data2, 'ExtendedAttributeValues')
輸出:
ColumnIdentifier Value UniqueId
0 COUNTY@LOCATION_EXTENSION Okaloosa - FL 538e05b45a9a4b31a46cf96c4ffab8cb
1 GW_REGION@LOCATION_EXTENSION Western Panhandle Embayment Region 5f51ebde984c4bdd92dff067cbe5b39b
2 LAND_NET@LOCATION_EXTENSION S016T3NR22W 8d8139c9027a497f9cae4ef7471930ba
ColumnIdentifier Value UniqueId
0 DATA_USED@GW_EXTENSION dace52af725b42a9aa63aa8e1b9a1b74
1 TOP_BUCATUNA@GW_EXTENSION 352e5763d90748a490b32ba833a65d1c
2 TOP_FLORIDAN@GW_EXTENSION b940292e63e84214ab785584f420674b
如果您有比這更大的 JSON 響應,例如舊Value
現在存儲在更高級別,您可以查看pd.json_normalize
的文檔以了解如何提取該信息,或使用足夠的信息更新您的問題以實際回答你的問題。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.