简体   繁体   English

Python 从字典中拆分列表中的字符串后,分别计算出现次数以及出现次数最多的元素

[英]Python count the number of occurrences and also the elements with maximum occurences separately after splitting the string in list from dictionary

I have a dictionary with the values as list of strings as follows:我有一个字典,其中的值作为字符串列表,如下所示:

dict_1 = { 
    0: ['john 1', 'jacob 2', 'john 3', 'john 4', 'jacob 7', 'astor 6', 'michael 8', 'michael 9'],
    1: ['jacob 11', 'jacob 13', 'astor 15', 'astor 17', 'michael 20'] 
}

I need to count the number of times a name has occurred in the lists of a particular key in the dictionary.我需要计算名称在字典中特定键的列表中出现的次数。 Also I need to output the elements with max count separately.我还需要 output 分别具有最大计数的元素。 So the output I need is like this:所以我需要的output是这样的:

Key 0
john 3
jacob 2
astor 1
michael 2
Max element count: 
john 3

Key 1
jacob 2
astor 2
michael 1
Max element count: 
jacob 2
astor 2

How can I do this in the fastest way possible in python?我怎样才能在 python 中以最快的方式做到这一点?

Since the pattern is fixed and you opt for speed, I'd make a list instead of dict with keys 0, 1, .. , use generator expression for collections.Counter and also subtract the name as entry[:entry.index(" ")] :由于模式是固定的并且您选择速度,因此我将使用键0, 1, ..制作一个列表而不是 dict,使用collections.Counter的生成器表达式并将名称减去entry[:entry.index(" ")]

import operator as op
from itertools import groupby
from collections import Counter

dict_1 = {0: ['john 1', 'jacob 2', 'john 3', 'john 4', 'jacob 7', 'astor 6', 'michael 8', 'michael 9'], 1: ['jacob 11', 'jacob 13', 'astor 15', 'astor 17', 'michael 20']}

all_scores = [Counter(entry[:entry.index(" ")] for entry in list_).most_common() for list_ in dict_1.values()]
# [[('john', 3), ('jacob', 2), ('michael', 2), ('astor', 1)], [('jacob', 2), ('astor', 2), ('michael', 1)]]

max_scores = [list(next(groupby(scores, key=op.itemgetter(1)))[1]) for scores in all_scores]
# [[('john', 3)], [('jacob', 2), ('astor', 2)]]

# Report them
for key, (scores, maximums) in enumerate(zip(all_scores, max_scores)):
    print(f"Key {key}")
    for name, score in scores:
        print(name, score)
    print("Max element count:")
    for name, max_score in maximums:
        print(name, max_score)
    print()

For getting the maximums, we use one next on groupby .为了获得最大值,我们在groupby上使用next Since the counts are already coming as sorted from Counter , we shouldn't be traversing all the scores;由于计数已经从Counter中排序,我们不应该遍历所有分数; groupby is a good and fast suit for this problem which groups according to the second elements ie numbers (hence the op.itemgetter(1) ). groupby是一个很好且快速的解决方案,它根据第二个元素(即数字)进行分组(因此op.itemgetter(1) )。 Then we cast to list and our desired pairs are in the second element (first element contains the respective numbers).然后我们转换为列表,我们想要的对在第二个元素中(第一个元素包含相应的数字)。

In addition to the answer from @Ajax1234, you asked how to get the max element counts, which you can do something like this (after his answer provides the vals ):除了@Ajax1234 的回答之外,您还询问了如何获取最大元素数,您可以这样做(在他的回答提供vals之后):

max_per_key = {}
for key, val in vals.items():
    # This gets a list with 1 entry containing the most common element, the [0] pulls the tuple out of the list and the [1] gets the count out of the tuple
    max_count = val.most_common(1)[0][1]
    max_list = [(k, v) for k, v in val.items() if v == max_count]
    max_per_key[key] = max_list

You can use re.sub with collections.Counter :您可以将re.subcollections.Counter一起使用:

import re, collections as col
dict_1 = {0: ['john 1', 'jacob 2', 'john 3', 'john 4', 'jacob 7', 'astor 6', 'michael 8', 'michael 9'], 1: ['jacob 11', 'jacob 13', 'astor 15', 'astor 17', 'michael 20']}
vals = {a:col.Counter([re.sub('\s\d+$', '', k) for k in b]) for a, b in dict_1.items()}

The result is a dictionary storing the key along with a collections.Counter object storing the number of name occurrences:结果是一个存储密钥的字典以及存储名称出现次数的collections.Counter object:

{0: Counter({'john': 3, 'jacob': 2, 'michael': 2, 'astor': 1}), 1: Counter({'jacob': 2, 'astor': 2, 'michael': 1})}

To get the desired printout:要获得所需的打印输出:

for a, b in vals.items():
   print(f'Key {a}')
   print('\n'.join(f'{j} {k}' for j, k in b.items()))
   print()

Output: Output:

Key 0
john 3
jacob 2
astor 1
michael 2

Key 1
jacob 2
astor 2
michael 1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM