將嵌套的 JSON 讀入 Pandas DataFrame

Question

背景資料 -
I have a JSON response from an API call, which I am trying to save in a pandas DataFrame, whilst maintaining the same structure, as when I view in a system I have called the data from.

JSON 響應 -

{
  "meta": {
    "columns": [
      {
        "key": "node_id",
        "display_name": "Entity ID",
        "output_type": "Word"
      },
      {
        "key": "bottom_level_holding_account_number",
        "display_name": "Holding Account Number",
        "output_type": "Word"
      },
      {
        "key": "value",
        "display_name": "Adjusted Value (USD)",
        "output_type": "Number",
        "currency": "USD"
      },
      {
        "key": "node_ownership",
        "display_name": "% Ownership",
        "output_type": "Percent"
      },
      {
        "key": "model_type",
        "display_name": "Model Type",
        "output_type": "Word"
      },
      {
        "key": "valuation",
        "display_name": "Valuation (USD)",
        "output_type": "Number",
        "currency": "USD"
      },
      {
        "key": "_custom_jb_custodian_305769",
        "display_name": "JB Custodian",
        "output_type": "Word"
      },
      {
        "key": "top_level_owner",
        "display_name": "Top Level Owner",
        "output_type": "Word"
      },
      {
        "key": "top_level_legal_entity",
        "display_name": "Top Level Legal Entity",
        "output_type": "Word"
      },
      {
        "key": "direct_owner",
        "display_name": "Direct Owner",
        "output_type": "Word"
      },
      {
        "key": "online_status",
        "display_name": "Online Status",
        "output_type": "Word"
      },
      {
        "key": "financial_service",
        "display_name": "Financial Service",
        "output_type": "Word"
      },
      {
        "key": "_custom_placeholder_461415",
        "display_name": "Placeholder or Fee Basis",
        "output_type": "Boolean"
      },
      {
        "key": "_custom_close_date_411160",
        "display_name": "Account Close Date",
        "output_type": "Date"
      },
      {
        "key": "_custom_ownership_audit_note_425843",
        "display_name": "Ownership Audit Note",
        "output_type": "Word"
      }
    ],
    "groupings": [
      {
        "key": "holding_account",
        "display_name": "Holding Account"
      }
    ]
  },
  "data": {
    "type": "portfolio_views",
    "attributes": {
      "total": {
        "name": "Total",
        "columns": {
          "direct_owner": null,
          "node_ownership": null,
          "online_status": null,
          "_custom_ownership_audit_note_425843": null,
          "model_type": null,
          "_custom_placeholder_461415": null,
          "top_level_owner": null,
          "_custom_close_date_411160": null,
          "valuation": null,
          "bottom_level_holding_account_number": null,
          "_custom_jb_custodian_305769": null,
          "financial_service": null,
          "top_level_legal_entity": null,
          "value": null,
          "node_id": null
        },
        "children": [
          {
            "entity_id": 4754837,
            "name": "Apple Holdings Adv (748374923)",
            "grouping": "holding_account",
            "columns": {
              "direct_owner": "Apple Holdings LLC",
              "node_ownership": 1,
              "online_status": "Online",
              "_custom_ownership_audit_note_425843": null,
              "model_type": "Holding Account",
              "_custom_placeholder_461415": false,
              "top_level_owner": "Forsyth Family",
              "_custom_close_date_411160": null,
              "valuation": 10423695.609450001,
              "bottom_level_holding_account_number": "748374923",
              "_custom_jb_custodian_305769": "Laverockbank",
              "financial_service": "laverockbankcustodianservice",
              "top_level_legal_entity": "Apple Holdings LLC",
              "value": 10423695.609450001,
              "node_id": "4754837"
            },
          }
        ]
      }
    }
  },
  "included": []
}

Pandas DataFrame 中 JSON 的預期結構 -
這是我試圖在我的 pandas DataFrame 中傳達的結構 -

| Holding Account                 | Entity ID | Holding Account Number | Adjusted Value (USD) | % Ownership | Model Type      | Valuation (USD) | JB Custodian | Top Level Owner | Top Level Legal Entity          | Direct Owner                    | Online Status | Financial Service   | Placeholder or Fee Basis | Account Close Date | Ownership Audit Note |
|---------------------------------|-----------|------------------------|----------------------|-------------|-----------------|-----------------|--------------|-----------------|---------------------------------|---------------------------------|---------------|---------------------|--------------------------|--------------------|----------------------|
| Apple Holdings Adv (748374923)  | 4754837   | 748374923              | $10,423,695.06       | 100.00%     | Holding Account | $10,423,695.06  | BRF          | Forsyth Family  | Apple Holdings Partners LLC     | Apple Holdings Partners LLC     | Online        | custodianservice    | No                       | -                  | -                    |

我對 JSON 結構的解釋——
看起來我需要專注於{'columns: （具有列標題）和'data':的'children' （代表數據行，在我的情況下，只有 1x 行）。 我可以忽略'groupings': [{'key': 'holding_account', 'display_name': 'Holding Account'}]}, ，因為這最終是數據在系統中的排序方式。

有人對我如何使用 JSON 並加載到具有演示結構的 DataFrame 有建議嗎？

我的解釋是，我需要將display_names [ columns ] 設置為標題，然后將 map 設置為每個相應的display_names / headers 下的相應children值。 注意：通常情況下，會有更多的children （代表我的 DataFrame 的每一行數據），但是我已經剝離了除了 1x 之外的所有內容，以便於解釋。

Answer 1

我不確定這是解壓字典的最佳方式，但它確實有效：
（它用於保留孩子的“元數據”，如 id（重復），並持有賬戶全名）

def unpack_dict(item, out):
    for k, v in item.items():
        if type(v) == dict:
            unpack_dict(v, out)
        else:
            out[k] = v
    return out

現在我們需要對每個孩子使用它來獲取數據

從您的示例中，您似乎想要保留持有帳戶（來自孩子），但您不想要 entity_id，因為它在 node_id 中重復？

不確定，所以我將只包含所有列及其“原始”名稱

columns = unpack_dict(res["data"]["attributes"]["total"]["children"][0]
children = res["data"]["attributes"]["total"]["children"]
data = []

for i in children:
    data.append(list(unpack_dict(i, {}).values()))

並從中創建一個 dataframe ：

>>> pd.DataFrame(data=data, columns = columns)
   entity_id                            name  ...         value  node_id
0    4754837  Apple Holdings Adv (748374923)  ...  1.042370e+07  4754837

[1 rows x 18 columns]

現在可以將其更改為具有顯示名稱而不是這些原始名稱。 不過，您可能需要刪除一些列，正如我上面提到的，id 是重復的，您提到了分組等。

如果您正在處理大量數據（數千個條目）並且解析它需要很長時間，則可以在插入data之前刪除過多的列以節省一些時間。

要使用dict重命名列：

df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'})

Answer 2

I suggest using pd.json_normalize() ( https://pandas.pydata.org/pandas-docs/version/1.2.0/reference/api/pandas.json_normalize.html ) which helps transform JSON data into a pandas DataFrame.

注 1：接下來我假設數據在名為data的 python 字典中可用。 出於測試目的，我使用了

import json
data = json.loads(json_data)

其中json_data是您的 JSON 響應。 由於json.loads()不接受最后的逗號，因此我在子 object 之后省略了逗號。

pd.json_normalize()提供不同的選項。 一種可能性是簡單地讀取所有“子”數據，然后刪除不需要的列。 此外，在規范化后，某些列具有前綴“列”。 需要刪除。

import pandas as pd
df = pd.json_normalize(data['data']['attributes']['total']['children'])
df.drop(columns=['grouping', 'entity_id'], inplace=True)
df.columns = df.columns.str.replace(r'columns.', '')

最后，需要將列名替換為“列”數據中的列名：

column_name_mapper = {column['key']: column['display_name'] for column in data['meta']['columns']}
df.rename(columns=column_name_mapper, inplace=True)

注意 2：與您描述的預期結構有一些細微的偏差。 最值得注意的是，數據框 header 中的“名稱”一詞（行值為“Apple Holdings Adv (748374923)”）未更改為“持有賬戶”，因為在列列表中未找到這兩個術語。 描述的 JSON 響應和預期結構之間的一些其他值只是不同。

將嵌套的 JSON 讀入 Pandas DataFrame

問題描述

2 個解決方案

解決方案1
0 2021-12-05 11:04:18

解決方案2
0 2021-12-05 20:40:38

將嵌套的 JSON 讀入 Pandas DataFrame

問題描述

2 個解決方案

解決方案1 0 2021-12-05 11:04:18

解決方案2 0 2021-12-05 20:40:38

解決方案1
0 2021-12-05 11:04:18

解決方案2
0 2021-12-05 20:40:38