简体   繁体   中英

How to find the most common word in a list?

I've just started coding; so I'm not using dictionaries or sets or import or anything more advanced than for/while loops and if statements

list1 = ["cry", "me", "me", "no", "me", "no", "no", "cry", "me"] 
list2 = ["cry", "cry", "cry", "no", "no", "no", "me", "me", "me"] 

def codedlist(number):
      max= 0
      for k in hello:
            if first.count(number) > max:
                    max= first.count(number)

You can use collections.Counter to find it with one-liner:

from collections import Counter

list1 = ["cry", "me", "me", "no", "me", "no", "no", "cry", "me"] 
Counter(list1).most_common()[-1]

Output:

('cry', 2)

(most_common() returns the list of counted elements sorted by their count, and the last element [-1] is the least count)

Or a bit more complicated if you can have several minimal elements:

from collections import Counter

list1 = [1,2,3,4,4,4,4,4]
counted = Counter(list1).most_common()
least_count = min(counted, key=lambda y: y[1])[1]
list(filter(lambda x: x[1] == least_count, counted))

Output:

[(1, 1), (2, 1), (3, 1)]

You can use collections.Counter to count frequencies of each string, and then use min to get the minimum frequency, and then a list-comprehension to get strings that have that minimum frequency:

from collections import Counter

def codedlist(number):
    c = Counter(number)
    m = min(c.values())
    return [s for s, i in c.items() if i == m]

print(codedlist(list1))
print(codedlist(list2))

Output:

['cry']
['cry', 'no', 'me']
from collections import OrderedDict, Counter def least_common(words): d = dict(Counter(words)) min_freq = min(d.values()) return [(k,v) for k,v in d.items() if v == min_freq] words = ["cry", "cry", "cry", "no", "no", "no", "me", "me", "me"] print(least_common(words))

A simple, algorithmic way to do this:

def codedlist(my_list):
    least = 99999999 # A very high number
    word = ''
    for element in my_list:
        repeated = my_list.count(element)
        if repeated < least:
            least = repeated # This is just a counter
            word = element # This is the word
    return word

It's not very performatic though. There are better ways to do this, but i think that it's an easy way to understand for a beginner.

If you want all words sorted by min value:

import numpy as np

list1 = ["cry", "me", "me", "no", "me", "no", "no", "cry", "me"]
list2 = ["cry", "cry", "cry", "no", "no", "no", "me", "me", "me"]

uniques_values = np.unique(list1)

final_list = []
for i in range(0,len(uniques_values)):
    final_list.append((uniques_values[i], list1.count(uniques_values[i])))

def takeSecond(elem):
    return elem[1]

final_list.sort(key=takeSecond)

print(final_list)

For list1:

[('cry', 2), ('no', 3), ('me', 4)]

For list2:

[('cry', 3), ('me', 3), ('no', 3)]

Be careful with the code, to change the list you have to edit the code in two points.

Some useful explanation:

  • numpy.unique gives you non-repeated values

  • def takeSecond(elem) with return elem[1] , is a function which allows you to sort a array by the [1] column (the second value).

It could be useful to display values or get all items sorted by this criteria.

Hope it helps.

Finding the minimum is often similar to finding the maximum. You count the number of occurrences of an element and if this count is smaller than counter(for least common element occurrence count): you replace the counter.

This is a crude solution that uses a lot of memory and takes a lot of time to run. You will understand more of lists (and their manipulation) if you try to shorten the run time and memory usage. I hope this helps!

list1 = ["cry", "me", "me", "no", "me", "no", "no", "cry", "me"]
list2 = ["cry", "cry", "cry", "no", "no", "no", "me", "me", "me"]

def codedlist(l):
    min = False #This is out counter
    indices = [] #This records the positions of the counts
    for i in range(0,len(l)):
        count = 0
        for x in l: #You can possibly shorten the run time here
            if(x == l[i]):
                count += 1
        if not min: #Also can be read as: If this is the first element.
            min = count
            indices = [i]
        elif min > count: #If this element is the least common
            min = count #Replace the counter
            indices = [i] # This is your only index
        elif min == count: #If this least common (but there were more element with the same count)
            indices.append(i) #Add it to our indices counter

    tempList = []
    #You can possibly shorten the run time below
    for ind in indices:
        tempList.append(l[ind])
    rList = []
    for x in tempList: #Remove duplicates in the list
        if x not in rList:
            rList.append(x)
    return rList

print(codedlist(list1))
print(codedlist(list2))

Output

['cry']
['cry', 'no', 'me']
def codedlist(list):
    dict = {}
    for item in list:
        dict[item]=list.count(item)
    most_common_number = max(dict.values())
    most_common = []
    for k,v in dict.items():
        if most_common_number == v:
            most_common.append(k)
    return most_common
list1 = ["cry", "me", "me", "no", "me", "no", "no", "cry", "me"] 
list2 = ["cry", "cry", "cry", "no", "no", "no", "me", "me", "me"] 

print(codedlist(list1))

Probably the most simple and fastest approach to recieve the least common item in a collection.

min(list1, key=list1.count)

In action:

>>> data = ["cry", "me", "me", "no", "me", "no", "no", "cry", "me"]
>>> min(data, key=data.count)
'cry'

Tested the speed vs the collections.Counter approach and it's much faster. See this REPL .

PS: The same can be done with max to find the most common item.

Edit

To get multiple least common items you can extend this approach using a comprehension.

>>> lc = data.count(min(data, key=data.count))
>>> {i for i in data if data.count(i) == lc}
{'no', 'me', 'cry'}

Basically you want to go through your list and at each element ask yourself:

"Have I seen this element before?"

If the answer is yes you add 1 to the count of that element if the answer is no you add it to the dictionary of seen values. Finally we sort it by values and then pick the first word as that one is the smallest.Lets implement it:

import operator

words = ['blah','blah','car']
seen_dictionary = {}
for w in words:
    if w in seen_dic.keys():
        seen_dictionary[w] += 1 
    else:
        seen_dic.update({w : 1})

final_word = sorted(x.items(), key=operator.itemgetter(1))[0][0] #as the output will be 2D tuple sorted by the second element in each of smaller tuples.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM