[英]how to group element's of a list with respect of some information in each elements?
I have a list.我有一个清单。 each element's of my list is like this:
我列表中的每个元素都是这样的:
list[0]={'Keywords': ' foster care case aide ',
'categoryId': '1650',
'result': {'categoryId': '1650',
'categoryName': 'case aide',
'score': '1.04134220123291'}}
can I collect all keywords whose have the same categoryId
in the same group.我可以收集同一组中具有相同
categoryId
的所有关键字吗? and count for each categoryId
how many keywords
do I have?并为每个
categoryId
计算我有多少个keywords
?
please let me know if it is not possible如果不可能,请告诉我
You could use the collections.defaultdict
to make a set
for each categoryId
and add the associated words:您可以使用
collections.defaultdict
为每个categoryId
设置一set
并添加相关单词:
from collections import defaultdict
output = defaultdict(set)
for entry in list:
kwds = entry['Keywords'].strip().split(' ')
for word in kwds:
output[entry['categoryId']].add(word)
I'm using a set
because I assumed you don't want repeats of words within each categoryId
.我正在使用一个
set
,因为我假设您不希望每个categoryId
中的单词重复。 You could instead use a list
or some other collection.您可以改为使用
list
或其他集合。
You can then get out the size of each ID:然后你可以得到每个 ID 的大小:
for k, v in output.items():
print(f'ID: {k}, words: {len(v)}')
# ID: 1650, words: 4
Responding to the comments from OP:回应OP的评论:
If you are getting KeyError: 'categoryId'
, that means some entries do not have the key 'categoryId'
.如果您收到
KeyError: 'categoryId'
,这意味着某些条目没有键'categoryId'
。 If you want to simply skip those entries, you can add a small catch into the above solution:如果您只想跳过这些条目,可以在上述解决方案中添加一个小问题:
for entry in list:
# catch if there is a missing ID field
if entry.get('categoryId', None) is None:
break
# otherwise the same
kwds = entry['Keywords'].strip().split(' ')
for word in kwds:
output[entry['categoryId']].add(word)
If there is no categoryID
, the loop will break
, and the entry will be skipped.如果没有
categoryID
,循环将break
,条目将被跳过。
Note that we are also depending on a "Keywords"
field being there as well, so you may need to add a catch for that as well.请注意,我们也依赖于那里的
"Keywords"
字段,因此您可能还需要为此添加一个catch。
Or , if you want to collect all the keywords from entries without an ID, you can just use dict.get()
in the original solution:或者,如果您想从没有 ID 的条目中收集所有关键字,您可以在原始解决方案中使用
dict.get()
:
for entry in data:
kwds = entry['Keywords'].strip().split(' ')
for word in kwds:
output[entry.get('categoryId', None)].add(word)
Now if there is no categoryId
, the keywords will be added to the key None
in output
.现在如果没有
categoryId
,则关键字将添加到output
中的键None
中。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.