如何展平具有非常量屬性的字典列表

Question

直到最近，我還使用下面相對簡單的代碼從 API 獲取位置數據，展平響應並轉換為展平數據框/表格。 它工作得很好，因為鍵“ExtendedAttributes”返回一組恆定的值並以恆定的順序。 API 架構在一夜之間發生了變化（沒有警告），現在根據每個位置的屬性返回不同的值。

將嵌套的字典列表展平為表格並重命名列的更簡單的解決方案不再有效。 我不確定如何處理這樣的案例或首先嘗試什么方法。

腳本

# Fetch the initial location description list
locationDescriptions = timeseries.publish.get('/GetLocationDescriptionList')['LocationDescriptions']

#Loop through provisioning API to get full location info for every UniqueID
locations = timeseries.provisioning.send_batch_requests('/locations/{Id}', [{'LocationUniqueId': loc['UniqueId']} for loc in locationDescriptions])

#API response in 'locations' is nested JSON so we need to unpack/flatten it.  
dic_flattened = [flatten(d) for d in locations]

df_flat =pd.DataFrame(dic_flattened)

#give value columns matching names
df_flat.rename(columns = {'ExtendedAttributeValues_0_Value' : 'COUNTY' ...}, inplace = true)

第一個位置

 "ExtendedAttributeValues": [
{
  "ColumnIdentifier": "COUNTY@LOCATION_EXTENSION",
  "Value": "Okaloosa - FL",
  "UniqueId": "538e05b45a9a4b31a46cf96c4ffab8cb"
},
{
  "ColumnIdentifier": "GW_REGION@LOCATION_EXTENSION",
  "Value": "Western Panhandle Embayment Region",
  "UniqueId": "5f51ebde984c4bdd92dff067cbe5b39b"
},
{
  "ColumnIdentifier": "LAND_NET@LOCATION_EXTENSION",
  "Value": "S016T3NR22W",
  "UniqueId": "8d8139c9027a497f9cae4ef7471930ba"
}

第二個位置（屬性不再匹配）

"ExtendedAttributeValues": [
{
  "ColumnIdentifier": "DATA_USED@GW_EXTENSION",
  "Value": "",
  "UniqueId": "dace52af725b42a9aa63aa8e1b9a1b74"
},
{
  "ColumnIdentifier": "TOP_BUCATUNA@GW_EXTENSION",
  "Value": "",
  "UniqueId": "352e5763d90748a490b32ba833a65d1c"
},
{
  "ColumnIdentifier": "TOP_FLORIDAN@GW_EXTENSION",
  "Value": "",
  "UniqueId": "b940292e63e84214ab785584f420674b"
}

展平現在會產生如下表格：

ExtendedAttributeValues_0_Value	ExtendedAttributeValues_1_ColumnIdentifier	ExtendedAttributeValues_1_Value	ExtendedAttributeValues_2_ColumnIdentifier	ExtendedAttributeValues_2_Value
COUNTY@LOCATION_EXTENSION	奧卡盧薩 - 佛羅里達州	GW_REGION@LOCATION_EXTENSION	西部狹長海灣地區	LAND_NET@LOCATION_EXTENSION
DATA_USED@GW_EXTENSION		TOP_BUCATUNA@GW_EXTENSION		TOP_FLORIDAN@GW_EXTENSION

但我想將每個“ColumnIdentifier”轉換為列名，並用關聯的“Value”填充該列的行：

DATA_USED	GW_REGION	TOP_BUCATUNA	LAND_NET	TOP_佛羅里達
奧卡盧薩 - 佛羅里達州		西部狹長海灣地區		S016T3NR22W

Answer 1

data = {
     "ExtendedAttributeValues": [
    {
      "ColumnIdentifier": "COUNTY@LOCATION_EXTENSION",
      "Value": "Okaloosa - FL",
      "UniqueId": "538e05b45a9a4b31a46cf96c4ffab8cb"
    },
    {
      "ColumnIdentifier": "GW_REGION@LOCATION_EXTENSION",
      "Value": "Western Panhandle Embayment Region",
      "UniqueId": "5f51ebde984c4bdd92dff067cbe5b39b"
    },
    {
      "ColumnIdentifier": "LAND_NET@LOCATION_EXTENSION",
      "Value": "S016T3NR22W",
      "UniqueId": "8d8139c9027a497f9cae4ef7471930ba"
    }
    ]
}

data2 = {
    "ExtendedAttributeValues": [
    {
      "ColumnIdentifier": "DATA_USED@GW_EXTENSION",
      "Value": "",
      "UniqueId": "dace52af725b42a9aa63aa8e1b9a1b74"
    },
    {
      "ColumnIdentifier": "TOP_BUCATUNA@GW_EXTENSION",
      "Value": "",
      "UniqueId": "352e5763d90748a490b32ba833a65d1c"
    },
    {
      "ColumnIdentifier": "TOP_FLORIDAN@GW_EXTENSION",
      "Value": "",
      "UniqueId": "b940292e63e84214ab785584f420674b"
    }
    ]
}

給定任一字典，我們可以簡單地使用pd.json_normalize讀取它們：

df1 = pd.json_normalize(data, 'ExtendedAttributeValues')
df2 = pd.json_normalize(data2, 'ExtendedAttributeValues')

輸出：

df1

               ColumnIdentifier                               Value                          UniqueId
0     COUNTY@LOCATION_EXTENSION                       Okaloosa - FL  538e05b45a9a4b31a46cf96c4ffab8cb
1  GW_REGION@LOCATION_EXTENSION  Western Panhandle Embayment Region  5f51ebde984c4bdd92dff067cbe5b39b
2   LAND_NET@LOCATION_EXTENSION                         S016T3NR22W  8d8139c9027a497f9cae4ef7471930ba

df2

            ColumnIdentifier Value                          UniqueId
0     DATA_USED@GW_EXTENSION        dace52af725b42a9aa63aa8e1b9a1b74
1  TOP_BUCATUNA@GW_EXTENSION        352e5763d90748a490b32ba833a65d1c
2  TOP_FLORIDAN@GW_EXTENSION        b940292e63e84214ab785584f420674b

如果您有比這更大的 JSON 響應，例如舊Value現在存儲在更高級別，您可以查看pd.json_normalize的文檔以了解如何提取該信息，或使用足夠的信息更新您的問題以實際回答你的問題。

如何展平具有非常量屬性的字典列表

問題描述

1 個解決方案

解決方案1
0 2022-06-22 01:11:47

如何展平具有非常量屬性的字典列表

問題描述

1 個解決方案

解決方案1 0 2022-06-22 01:11:47

解決方案1
0 2022-06-22 01:11:47