简体   繁体   中英

Python count the number of occurrences and also the elements with maximum occurences separately after splitting the string in list from dictionary

I have a dictionary with the values as list of strings as follows:

dict_1 = { 
    0: ['john 1', 'jacob 2', 'john 3', 'john 4', 'jacob 7', 'astor 6', 'michael 8', 'michael 9'],
    1: ['jacob 11', 'jacob 13', 'astor 15', 'astor 17', 'michael 20'] 
}

I need to count the number of times a name has occurred in the lists of a particular key in the dictionary. Also I need to output the elements with max count separately. So the output I need is like this:

Key 0
john 3
jacob 2
astor 1
michael 2
Max element count: 
john 3

Key 1
jacob 2
astor 2
michael 1
Max element count: 
jacob 2
astor 2

How can I do this in the fastest way possible in python?

Since the pattern is fixed and you opt for speed, I'd make a list instead of dict with keys 0, 1, .. , use generator expression for collections.Counter and also subtract the name as entry[:entry.index(" ")] :

import operator as op
from itertools import groupby
from collections import Counter

dict_1 = {0: ['john 1', 'jacob 2', 'john 3', 'john 4', 'jacob 7', 'astor 6', 'michael 8', 'michael 9'], 1: ['jacob 11', 'jacob 13', 'astor 15', 'astor 17', 'michael 20']}

all_scores = [Counter(entry[:entry.index(" ")] for entry in list_).most_common() for list_ in dict_1.values()]
# [[('john', 3), ('jacob', 2), ('michael', 2), ('astor', 1)], [('jacob', 2), ('astor', 2), ('michael', 1)]]

max_scores = [list(next(groupby(scores, key=op.itemgetter(1)))[1]) for scores in all_scores]
# [[('john', 3)], [('jacob', 2), ('astor', 2)]]

# Report them
for key, (scores, maximums) in enumerate(zip(all_scores, max_scores)):
    print(f"Key {key}")
    for name, score in scores:
        print(name, score)
    print("Max element count:")
    for name, max_score in maximums:
        print(name, max_score)
    print()

For getting the maximums, we use one next on groupby . Since the counts are already coming as sorted from Counter , we shouldn't be traversing all the scores; groupby is a good and fast suit for this problem which groups according to the second elements ie numbers (hence the op.itemgetter(1) ). Then we cast to list and our desired pairs are in the second element (first element contains the respective numbers).

In addition to the answer from @Ajax1234, you asked how to get the max element counts, which you can do something like this (after his answer provides the vals ):

max_per_key = {}
for key, val in vals.items():
    # This gets a list with 1 entry containing the most common element, the [0] pulls the tuple out of the list and the [1] gets the count out of the tuple
    max_count = val.most_common(1)[0][1]
    max_list = [(k, v) for k, v in val.items() if v == max_count]
    max_per_key[key] = max_list

You can use re.sub with collections.Counter :

import re, collections as col
dict_1 = {0: ['john 1', 'jacob 2', 'john 3', 'john 4', 'jacob 7', 'astor 6', 'michael 8', 'michael 9'], 1: ['jacob 11', 'jacob 13', 'astor 15', 'astor 17', 'michael 20']}
vals = {a:col.Counter([re.sub('\s\d+$', '', k) for k in b]) for a, b in dict_1.items()}

The result is a dictionary storing the key along with a collections.Counter object storing the number of name occurrences:

{0: Counter({'john': 3, 'jacob': 2, 'michael': 2, 'astor': 1}), 1: Counter({'jacob': 2, 'astor': 2, 'michael': 1})}

To get the desired printout:

for a, b in vals.items():
   print(f'Key {a}')
   print('\n'.join(f'{j} {k}' for j, k in b.items()))
   print()

Output:

Key 0
john 3
jacob 2
astor 1
michael 2

Key 1
jacob 2
astor 2
michael 1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM