簡體   English   中英

如何在 Pandas 嵌套 DataFrame 中合並或加入數據

[英]How merge or join data in a Pandas nested DataFrame

我試圖弄清楚如何在 DataFrame 中的嵌套字段上執行合並或加入。下面是一些示例數據:

df_all_groups = pd.read_json("""
[
    {
        "object": "group",
        "id": "group-one",
        "collections": [
            {
                "id": "111-111-111",
                "readOnly": false
            },
            {
                "id": "222-222-222",
                "readOnly": false
            }
        ]
    },
    {
        "object": "group",
        "id": "group-two",
        "collections": [
            {
                "id": "111-111-111",
                "readOnly": false
            },
            {
                "id": "333-333-333",
                "readOnly": false
            }
        ]
    }
]
""")

df_collections_with_names = pd.read_json("""
[
    {
        "object": "collection",
        "id": "111-111-111",
        "externalId": null,
        "name": "Cats"
      },
      {
        "object": "collection",
        "id": "222-222-222",
        "externalId": null,
        "name": "Dogs"
      },
      {
        "object": "collection",
        "id": "333-333-333",
        "externalId": null,
        "name": "Fish"
      }
]
""")

我正在嘗試通過加入df_all_groups['collections'][<index>].id將 df_collections_with_names 中的name字段添加到每個df_collections_with_names df_all_groups['collections'][<index>]我試圖到達的 output 是:

[
    {
        "object": "group",
        "id": "group-one",
        "collections": [
            {
                "id": "111-111-111",
                "readOnly": false,
                "name": "Cats" // See Collection name was added
            },
            {
                "id": "222-222-222",
                "readOnly": false,
                "name": "Dogs" // See Collection name was added
            }
        ]
    },
    {
        "object": "group",
        "id": "group-two",
        "collections": [
            {
                "id": "111-111-111",
                "readOnly": false,
                "name": "Cats" // See Collection name was added
            },
            {
                "id": "333-333-333",
                "readOnly": false,
                "name": "Fish" // See Collection name was added
            }
        ]
    }
]

我試過使用merge方法,但似乎無法讓它在collections嵌套字段上運行,因為我認為這是一個系列。

這是一種方法:

首先使用 json 將用於構造df_all_groups (我將其命名為all_groups )的字符串 json 轉換為使用json.loads的字典。 然后使用json_normalize構造一個 DataFrame 。

然后將上面構造的 DataFrame 與df_collections_with_names merge 我們現在有“名稱”欄。

rest 正在根據上面獲得的結果構建所需的字典; groupby + apply(to_dict) + reset_index + to_dict將獲取所需的結果:

import json
out = (pd.json_normalize(json.loads(all_groups), ['collections'], ['object', 'id'], meta_prefix='_')
       .merge(df_collections_with_names, on='id', suffixes=('','_'))
       .drop(columns=['object','externalId']))
out = (out.groupby(['_object','_id']).apply(lambda x: x[['id','readOnly','name']].to_dict('records'))
       .reset_index(name='collections'))
out.rename(columns={c: c.strip('_') for c in out.columns}).to_dict('records')

Output:

[{'object': 'group',
  'id': 'group-one',
  'collections': [{'id': '111-111-111', 'readOnly': False, 'name': 'Cats'},
   {'id': '222-222-222', 'readOnly': False, 'name': 'Dogs'}]},
 {'object': 'group',
  'id': 'group-two',
  'collections': [{'id': '111-111-111', 'readOnly': False, 'name': 'Cats'},
   {'id': '333-333-333', 'readOnly': False, 'name': 'Fish'}]}]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM