简体   繁体   English

如何在列表中找到最常见的单词?

[英]How to find the most common word in a list?

I've just started coding; 我刚刚开始编码; so I'm not using dictionaries or sets or import or anything more advanced than for/while loops and if statements 所以我不使用字典或集合或导入或比for / while循环和if语句更高级的东西

list1 = ["cry", "me", "me", "no", "me", "no", "no", "cry", "me"] 
list2 = ["cry", "cry", "cry", "no", "no", "no", "me", "me", "me"] 

def codedlist(number):
      max= 0
      for k in hello:
            if first.count(number) > max:
                    max= first.count(number)

You can use collections.Counter to find it with one-liner: 您可以使用collections.Counter以单线查找它:

from collections import Counter

list1 = ["cry", "me", "me", "no", "me", "no", "no", "cry", "me"] 
Counter(list1).most_common()[-1]

Output: 输出:

('cry', 2)

(most_common() returns the list of counted elements sorted by their count, and the last element [-1] is the least count) (most_common()返回按其计数排序的计数元素列表,最后一个元素[-1]是最小计数)

Or a bit more complicated if you can have several minimal elements: 或者,如果可以包含几个最小元素,则稍微复杂一点:

from collections import Counter

list1 = [1,2,3,4,4,4,4,4]
counted = Counter(list1).most_common()
least_count = min(counted, key=lambda y: y[1])[1]
list(filter(lambda x: x[1] == least_count, counted))

Output: 输出:

[(1, 1), (2, 1), (3, 1)]

You can use collections.Counter to count frequencies of each string, and then use min to get the minimum frequency, and then a list-comprehension to get strings that have that minimum frequency: 您可以使用collections.Counter对每个字符串的频率进行计数,然后使用min获取最小频率,然后使用list-comprehension获取具有最小频率的字符串:

from collections import Counter

def codedlist(number):
    c = Counter(number)
    m = min(c.values())
    return [s for s, i in c.items() if i == m]

print(codedlist(list1))
print(codedlist(list2))

Output: 输出:

['cry']
['cry', 'no', 'me']
from collections import OrderedDict, Counter def least_common(words): d = dict(Counter(words)) min_freq = min(d.values()) return [(k,v) for k,v in d.items() if v == min_freq] words = ["cry", "cry", "cry", "no", "no", "no", "me", "me", "me"] print(least_common(words))

A simple, algorithmic way to do this: 一种简单的算法方法可以做到这一点:

def codedlist(my_list):
    least = 99999999 # A very high number
    word = ''
    for element in my_list:
        repeated = my_list.count(element)
        if repeated < least:
            least = repeated # This is just a counter
            word = element # This is the word
    return word

It's not very performatic though. 不过,它的表现不是很好。 There are better ways to do this, but i think that it's an easy way to understand for a beginner. 有更好的方法可以做到这一点,但是我认为对于初学者来说这是一种简单的理解方法。

If you want all words sorted by min value: 如果要所有单词按最小值排序:

import numpy as np

list1 = ["cry", "me", "me", "no", "me", "no", "no", "cry", "me"]
list2 = ["cry", "cry", "cry", "no", "no", "no", "me", "me", "me"]

uniques_values = np.unique(list1)

final_list = []
for i in range(0,len(uniques_values)):
    final_list.append((uniques_values[i], list1.count(uniques_values[i])))

def takeSecond(elem):
    return elem[1]

final_list.sort(key=takeSecond)

print(final_list)

For list1: 对于列表1:

[('cry', 2), ('no', 3), ('me', 4)] [('cry',2),('no',3),('me',4)]

For list2: 对于list2:

[('cry', 3), ('me', 3), ('no', 3)] [('cry',3),('me',3),('no',3)]

Be careful with the code, to change the list you have to edit the code in two points. 请谨慎使用代码,要更改列表,您必须在两点上编辑代码。

Some useful explanation: 一些有用的解释:

  • numpy.unique gives you non-repeated values numpy.unique为您提供非重复值

  • def takeSecond(elem) with return elem[1] , is a function which allows you to sort a array by the [1] column (the second value). 带有return elem [1]的 def takeSecond(elem)是一个允许您通过[1]列(第二个值)对数组进行排序的函数。

It could be useful to display values or get all items sorted by this criteria. 显示值或使所有项目按此条件排序可能很有用。

Hope it helps. 希望能帮助到你。

Finding the minimum is often similar to finding the maximum. 找到最小值通常与找到最大值相似。 You count the number of occurrences of an element and if this count is smaller than counter(for least common element occurrence count): you replace the counter. 您计算一个元素的出现次数,并且如果该计数小于计数器(对于最不常见的元素出现次数):则替换该计数器。

This is a crude solution that uses a lot of memory and takes a lot of time to run. 这是一个粗略的解决方案,它占用大量内存,并且需要大量时间才能运行。 You will understand more of lists (and their manipulation) if you try to shorten the run time and memory usage. 如果尝试缩短运行时间和内存使用量,您将了解更多列表(及其操作)。 I hope this helps! 我希望这有帮助!

list1 = ["cry", "me", "me", "no", "me", "no", "no", "cry", "me"]
list2 = ["cry", "cry", "cry", "no", "no", "no", "me", "me", "me"]

def codedlist(l):
    min = False #This is out counter
    indices = [] #This records the positions of the counts
    for i in range(0,len(l)):
        count = 0
        for x in l: #You can possibly shorten the run time here
            if(x == l[i]):
                count += 1
        if not min: #Also can be read as: If this is the first element.
            min = count
            indices = [i]
        elif min > count: #If this element is the least common
            min = count #Replace the counter
            indices = [i] # This is your only index
        elif min == count: #If this least common (but there were more element with the same count)
            indices.append(i) #Add it to our indices counter

    tempList = []
    #You can possibly shorten the run time below
    for ind in indices:
        tempList.append(l[ind])
    rList = []
    for x in tempList: #Remove duplicates in the list
        if x not in rList:
            rList.append(x)
    return rList

print(codedlist(list1))
print(codedlist(list2))

Output 输出量

['cry']
['cry', 'no', 'me']
def codedlist(list):
    dict = {}
    for item in list:
        dict[item]=list.count(item)
    most_common_number = max(dict.values())
    most_common = []
    for k,v in dict.items():
        if most_common_number == v:
            most_common.append(k)
    return most_common
list1 = ["cry", "me", "me", "no", "me", "no", "no", "cry", "me"] 
list2 = ["cry", "cry", "cry", "no", "no", "no", "me", "me", "me"] 

print(codedlist(list1))

Probably the most simple and fastest approach to recieve the least common item in a collection. 可能是最简单,最快的方法来接收馆藏中最不常见的物品。

min(list1, key=list1.count)

In action: 实际上:

>>> data = ["cry", "me", "me", "no", "me", "no", "no", "cry", "me"]
>>> min(data, key=data.count)
'cry'

Tested the speed vs the collections.Counter approach and it's much faster. 测试了速度与collections.Counter方法的对比,速度更快。 See this REPL . 看到这个REPL

PS: The same can be done with max to find the most common item. PS: max可以找到最常见的物品。

Edit 编辑

To get multiple least common items you can extend this approach using a comprehension. 要获得多个最不常见的项目,您可以使用理解来扩展此方法。

>>> lc = data.count(min(data, key=data.count))
>>> {i for i in data if data.count(i) == lc}
{'no', 'me', 'cry'}

Basically you want to go through your list and at each element ask yourself: 基本上,您想浏览一下列表,然后在每个元素中问自己:

"Have I seen this element before?" “我以前看过这个元素吗?”

If the answer is yes you add 1 to the count of that element if the answer is no you add it to the dictionary of seen values. 如果答案为是,则将该元素的计数加1;如果答案为否,则将其添加至可见值字典。 Finally we sort it by values and then pick the first word as that one is the smallest.Lets implement it: 最后我们按值对它进行排序,然后选择第一个单词,因为它是最小的,让我们实现它:

import operator

words = ['blah','blah','car']
seen_dictionary = {}
for w in words:
    if w in seen_dic.keys():
        seen_dictionary[w] += 1 
    else:
        seen_dic.update({w : 1})

final_word = sorted(x.items(), key=operator.itemgetter(1))[0][0] #as the output will be 2D tuple sorted by the second element in each of smaller tuples.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM