[英]Python count the number of occurrences and also the elements with maximum occurences separately after splitting the string in list from dictionary
I have a dictionary with the values as list of strings as follows:我有一个字典,其中的值作为字符串列表,如下所示:
dict_1 = {
0: ['john 1', 'jacob 2', 'john 3', 'john 4', 'jacob 7', 'astor 6', 'michael 8', 'michael 9'],
1: ['jacob 11', 'jacob 13', 'astor 15', 'astor 17', 'michael 20']
}
I need to count the number of times a name has occurred in the lists of a particular key in the dictionary.我需要计算名称在字典中特定键的列表中出现的次数。 Also I need to output the elements with max count separately.
我还需要 output 分别具有最大计数的元素。 So the output I need is like this:
所以我需要的output是这样的:
Key 0
john 3
jacob 2
astor 1
michael 2
Max element count:
john 3
Key 1
jacob 2
astor 2
michael 1
Max element count:
jacob 2
astor 2
How can I do this in the fastest way possible in python?我怎样才能在 python 中以最快的方式做到这一点?
Since the pattern is fixed and you opt for speed, I'd make a list instead of dict with keys 0, 1, ..
, use generator expression for collections.Counter
and also subtract the name as entry[:entry.index(" ")]
:由于模式是固定的并且您选择速度,因此我将使用键
0, 1, ..
制作一个列表而不是 dict,使用collections.Counter
的生成器表达式并将名称减去entry[:entry.index(" ")]
:
import operator as op
from itertools import groupby
from collections import Counter
dict_1 = {0: ['john 1', 'jacob 2', 'john 3', 'john 4', 'jacob 7', 'astor 6', 'michael 8', 'michael 9'], 1: ['jacob 11', 'jacob 13', 'astor 15', 'astor 17', 'michael 20']}
all_scores = [Counter(entry[:entry.index(" ")] for entry in list_).most_common() for list_ in dict_1.values()]
# [[('john', 3), ('jacob', 2), ('michael', 2), ('astor', 1)], [('jacob', 2), ('astor', 2), ('michael', 1)]]
max_scores = [list(next(groupby(scores, key=op.itemgetter(1)))[1]) for scores in all_scores]
# [[('john', 3)], [('jacob', 2), ('astor', 2)]]
# Report them
for key, (scores, maximums) in enumerate(zip(all_scores, max_scores)):
print(f"Key {key}")
for name, score in scores:
print(name, score)
print("Max element count:")
for name, max_score in maximums:
print(name, max_score)
print()
For getting the maximums, we use one next
on groupby
.为了获得最大值,我们在
groupby
上使用next
。 Since the counts are already coming as sorted from Counter
, we shouldn't be traversing all the scores;由于计数已经从
Counter
中排序,我们不应该遍历所有分数; groupby
is a good and fast suit for this problem which groups according to the second elements ie numbers (hence the op.itemgetter(1)
). groupby
是一个很好且快速的解决方案,它根据第二个元素(即数字)进行分组(因此op.itemgetter(1)
)。 Then we cast to list and our desired pairs are in the second element (first element contains the respective numbers).然后我们转换为列表,我们想要的对在第二个元素中(第一个元素包含相应的数字)。
In addition to the answer from @Ajax1234, you asked how to get the max element counts, which you can do something like this (after his answer provides the vals
):除了@Ajax1234 的回答之外,您还询问了如何获取最大元素数,您可以这样做(在他的回答提供
vals
之后):
max_per_key = {}
for key, val in vals.items():
# This gets a list with 1 entry containing the most common element, the [0] pulls the tuple out of the list and the [1] gets the count out of the tuple
max_count = val.most_common(1)[0][1]
max_list = [(k, v) for k, v in val.items() if v == max_count]
max_per_key[key] = max_list
You can use re.sub
with collections.Counter
:您可以将
re.sub
与collections.Counter
一起使用:
import re, collections as col
dict_1 = {0: ['john 1', 'jacob 2', 'john 3', 'john 4', 'jacob 7', 'astor 6', 'michael 8', 'michael 9'], 1: ['jacob 11', 'jacob 13', 'astor 15', 'astor 17', 'michael 20']}
vals = {a:col.Counter([re.sub('\s\d+$', '', k) for k in b]) for a, b in dict_1.items()}
The result is a dictionary storing the key along with a collections.Counter
object storing the number of name occurrences:结果是一个存储密钥的字典以及存储名称出现次数的
collections.Counter
object:
{0: Counter({'john': 3, 'jacob': 2, 'michael': 2, 'astor': 1}), 1: Counter({'jacob': 2, 'astor': 2, 'michael': 1})}
To get the desired printout:要获得所需的打印输出:
for a, b in vals.items():
print(f'Key {a}')
print('\n'.join(f'{j} {k}' for j, k in b.items()))
print()
Output: Output:
Key 0
john 3
jacob 2
astor 1
michael 2
Key 1
jacob 2
astor 2
michael 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.