[英]how to group some part of a list's information?
I have a list each element of list is like this:我有一个列表,列表的每个元素都是这样的:
list[0]={'Keywords': 'program manager',
'result': {'categoryId': '2712',
'categoryName': 'program manager',
'score': '0.9506622791290285'},
'categoryId': '2712'}
{'Keywords': 'technicalfunctional consultant', 'result': []}
I need to collect all keywords whose have the same categoryName
.我需要收集所有具有相同categoryName
的关键字。 I did the following:我做了以下事情:
output1 = defaultdict(set)
for entry in list:
kwds = entry['Keywords'].strip().split(' ')
for word in kwds:
output1[entry.get('categoryId', None)].add(word)
But it split all words and I don't want it.但它分裂了所有的词,我不想要它。 is there any way to collect all keywords with the same categoryName
?有没有办法收集所有具有相同categoryName
的关键字?
You can find all the existent categoryIds in the dataset, then build a dict that for each one of them contains all the keywords associated.您可以在数据集中找到所有现有的 categoryId,然后构建一个 dict,其中每个包含相关的所有关键字。
EDIT: Changed the code to have a MRE and changed the name of the list to list1
编辑:将代码更改为具有 MRE 并将列表名称更改为list1
Code:代码:
list1=[]
list1.append({'Keywords': 'program manager',
'result': {'categoryId': '2712',
'categoryName': 'program manager',
'score': '0.9506622791290285'},
'categoryId': '2712'})
cat_set = set([elem["result"]["categoryName"] for elem in list1])
cat_dict = {}
for cat_name in cat_set:
cat_dict[cat_name] = [elem["Keywords"] for elem in list1 if elem["result"]["categoryName"] == cat_name]
mylist=[{'Keywords': 'scrum master',
'result': {'categoryId': '3193',
'categoryName': 'agile coach',
'score': '1.0'},
'categoryId': '3193'},
{'Keywords': 'principal consultant',
'result': {'categoryId': '2655',
'categoryName': 'principal consultant',
'score': '1.045369052886963'},
'categoryId': '2655'},{'Keywords': 'technicalfunctional consultant', 'result': []}]
def keywordcollector(yourlist):
yourlist=list(filter(lambda x:type(x['result'])==type({}),yourlist))
categories=set(x['result']['categoryName'] for x in yourlist)
keywordslist=[]
for category in categories:
temp_list=list(filter(lambda x:x['result']['categoryName']==category,yourlist))
temp_keywords=list(map(lambda x:x['Keywords'],temp_list))
keywordslist.append({category:temp_keywords})
return keywordslist
print(keywordcollector(mylist))
The code above gives you dictionary object with keywords for every categoryname.上面的代码为您提供字典 object,其中每个类别名称都有关键字。 The code outputs代码输出
[{'principal consultant': ['principal consultant']}, {'agile coach': ['scrum master']}]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.