如何根據每個元素中的某些信息對列表的元素進行分組？

Question

我有一個清單。 我列表中的每個元素都是這樣的：

list[0]={'Keywords': ' foster care case aide ',
 'categoryId': '1650',
 'result': {'categoryId': '1650',
  'categoryName': 'case aide',
  'score': '1.04134220123291'}}

我可以收集同一組中具有相同categoryId的所有關鍵字嗎？ 並為每個categoryId計算我有多少個keywords ？

如果不可能，請告訴我

Answer 1

您可以使用collections.defaultdict為每個categoryId設置一set並添加相關單詞：

from collections import defaultdict

output = defaultdict(set)

for entry in list:
    kwds = entry['Keywords'].strip().split(' ')
    for word in kwds:
        output[entry['categoryId']].add(word)

我正在使用一個set ，因為我假設您不希望每個categoryId中的單詞重復。 您可以改為使用list或其他集合。

然后你可以得到每個 ID 的大小：

for k, v in output.items():
    print(f'ID: {k}, words: {len(v)}')

# ID: 1650, words: 4

回應OP的評論：

如果您收到KeyError: 'categoryId' ，這意味着某些條目沒有鍵'categoryId' 。 如果您只想跳過這些條目，可以在上述解決方案中添加一個小問題：

for entry in list:
    # catch if there is a missing ID field
    if entry.get('categoryId', None) is None: 
        break
  
    # otherwise the same
    kwds = entry['Keywords'].strip().split(' ')
    for word in kwds:
        output[entry['categoryId']].add(word)

如果沒有categoryID ，循環將break ，條目將被跳過。

請注意，我們也依賴於那里的"Keywords"字段，因此您可能還需要為此添加一個catch。

或者，如果您想從沒有 ID 的條目中收集所有關鍵字，您可以在原始解決方案中使用dict.get() ：

for entry in data:
    kwds = entry['Keywords'].strip().split(' ')
    for word in kwds:
        output[entry.get('categoryId', None)].add(word)

現在如果沒有categoryId ，則關鍵字將添加到output中的鍵None中。

如何根據每個元素中的某些信息對列表的元素進行分組？

問題描述

1 個解決方案

解決方案1
2 已采納 2021-01-19 15:51:02

如何根據每個元素中的某些信息對列表的元素進行分組？

問題描述

1 個解決方案

解決方案1 2 已采納 2021-01-19 15:51:02

解決方案1
2 已采納 2021-01-19 15:51:02