如何根据每个元素中的某些信息对列表的元素进行分组？

Question

I have a list.我有一个清单。 each element's of my list is like this:我列表中的每个元素都是这样的：

list[0]={'Keywords': ' foster care case aide ',
 'categoryId': '1650',
 'result': {'categoryId': '1650',
  'categoryName': 'case aide',
  'score': '1.04134220123291'}}

can I collect all keywords whose have the same categoryId in the same group.我可以收集同一组中具有相同categoryId的所有关键字吗？ and count for each categoryId how many keywords do I have?并为每个categoryId计算我有多少个keywords ？

please let me know if it is not possible如果不可能，请告诉我

Answer 1

You could use the collections.defaultdict to make a set for each categoryId and add the associated words:您可以使用collections.defaultdict为每个categoryId设置一set并添加相关单词：

from collections import defaultdict

output = defaultdict(set)

for entry in list:
    kwds = entry['Keywords'].strip().split(' ')
    for word in kwds:
        output[entry['categoryId']].add(word)

I'm using a set because I assumed you don't want repeats of words within each categoryId .我正在使用一个set ，因为我假设您不希望每个categoryId中的单词重复。 You could instead use a list or some other collection.您可以改为使用list或其他集合。

You can then get out the size of each ID:然后你可以得到每个 ID 的大小：

for k, v in output.items():
    print(f'ID: {k}, words: {len(v)}')

# ID: 1650, words: 4

Responding to the comments from OP:回应OP的评论：

If you are getting KeyError: 'categoryId' , that means some entries do not have the key 'categoryId' .如果您收到KeyError: 'categoryId' ，这意味着某些条目没有键'categoryId' 。 If you want to simply skip those entries, you can add a small catch into the above solution:如果您只想跳过这些条目，可以在上述解决方案中添加一个小问题：

for entry in list:
    # catch if there is a missing ID field
    if entry.get('categoryId', None) is None: 
        break
  
    # otherwise the same
    kwds = entry['Keywords'].strip().split(' ')
    for word in kwds:
        output[entry['categoryId']].add(word)

If there is no categoryID , the loop will break , and the entry will be skipped.如果没有categoryID ，循环将break ，条目将被跳过。

Note that we are also depending on a "Keywords" field being there as well, so you may need to add a catch for that as well.请注意，我们也依赖于那里的"Keywords"字段，因此您可能还需要为此添加一个catch。

Or , if you want to collect all the keywords from entries without an ID, you can just use dict.get() in the original solution:或者，如果您想从没有 ID 的条目中收集所有关键字，您可以在原始解决方案中使用dict.get() ：

for entry in data:
    kwds = entry['Keywords'].strip().split(' ')
    for word in kwds:
        output[entry.get('categoryId', None)].add(word)

Now if there is no categoryId , the keywords will be added to the key None in output .现在如果没有categoryId ，则关键字将添加到output中的键None中。

如何根据每个元素中的某些信息对列表的元素进行分组？

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-01-19 15:51:02

如何根据每个元素中的某些信息对列表的元素进行分组？

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-01-19 15:51:02

解决方案1
2 已采纳 2021-01-19 15:51:02