繁体   English   中英

如何在列表中找到最常见的单词?

[英]How to find the most common word in a list?

我刚刚开始编码; 所以我不使用字典或集合或导入或比for / while循环和if语句更高级的东西

list1 = ["cry", "me", "me", "no", "me", "no", "no", "cry", "me"] 
list2 = ["cry", "cry", "cry", "no", "no", "no", "me", "me", "me"] 

def codedlist(number):
      max= 0
      for k in hello:
            if first.count(number) > max:
                    max= first.count(number)

您可以使用collections.Counter以单线查找它:

from collections import Counter

list1 = ["cry", "me", "me", "no", "me", "no", "no", "cry", "me"] 
Counter(list1).most_common()[-1]

输出:

('cry', 2)

(most_common()返回按其计数排序的计数元素列表,最后一个元素[-1]是最小计数)

或者,如果可以包含几个最小元素,则稍微复杂一点:

from collections import Counter

list1 = [1,2,3,4,4,4,4,4]
counted = Counter(list1).most_common()
least_count = min(counted, key=lambda y: y[1])[1]
list(filter(lambda x: x[1] == least_count, counted))

输出:

[(1, 1), (2, 1), (3, 1)]

您可以使用collections.Counter对每个字符串的频率进行计数,然后使用min获取最小频率,然后使用list-comprehension获取具有最小频率的字符串:

from collections import Counter

def codedlist(number):
    c = Counter(number)
    m = min(c.values())
    return [s for s, i in c.items() if i == m]

print(codedlist(list1))
print(codedlist(list2))

输出:

['cry']
['cry', 'no', 'me']
from collections import OrderedDict, Counter def least_common(words): d = dict(Counter(words)) min_freq = min(d.values()) return [(k,v) for k,v in d.items() if v == min_freq] words = ["cry", "cry", "cry", "no", "no", "no", "me", "me", "me"] print(least_common(words))

一种简单的算法方法可以做到这一点:

def codedlist(my_list):
    least = 99999999 # A very high number
    word = ''
    for element in my_list:
        repeated = my_list.count(element)
        if repeated < least:
            least = repeated # This is just a counter
            word = element # This is the word
    return word

不过,它的表现不是很好。 有更好的方法可以做到这一点,但是我认为对于初学者来说这是一种简单的理解方法。

如果要所有单词按最小值排序:

import numpy as np

list1 = ["cry", "me", "me", "no", "me", "no", "no", "cry", "me"]
list2 = ["cry", "cry", "cry", "no", "no", "no", "me", "me", "me"]

uniques_values = np.unique(list1)

final_list = []
for i in range(0,len(uniques_values)):
    final_list.append((uniques_values[i], list1.count(uniques_values[i])))

def takeSecond(elem):
    return elem[1]

final_list.sort(key=takeSecond)

print(final_list)

对于列表1:

[('cry',2),('no',3),('me',4)]

对于list2:

[('cry',3),('me',3),('no',3)]

请谨慎使用代码,要更改列表,您必须在两点上编辑代码。

一些有用的解释:

  • numpy.unique为您提供非重复值

  • 带有return elem [1]的 def takeSecond(elem)是一个允许您通过[1]列(第二个值)对数组进行排序的函数。

显示值或使所有项目按此条件排序可能很有用。

希望能帮助到你。

找到最小值通常与找到最大值相似。 您计算一个元素的出现次数,并且如果该计数小于计数器(对于最不常见的元素出现次数):则替换该计数器。

这是一个粗略的解决方案,它占用大量内存,并且需要大量时间才能运行。 如果尝试缩短运行时间和内存使用量,您将了解更多列表(及其操作)。 我希望这有帮助!

list1 = ["cry", "me", "me", "no", "me", "no", "no", "cry", "me"]
list2 = ["cry", "cry", "cry", "no", "no", "no", "me", "me", "me"]

def codedlist(l):
    min = False #This is out counter
    indices = [] #This records the positions of the counts
    for i in range(0,len(l)):
        count = 0
        for x in l: #You can possibly shorten the run time here
            if(x == l[i]):
                count += 1
        if not min: #Also can be read as: If this is the first element.
            min = count
            indices = [i]
        elif min > count: #If this element is the least common
            min = count #Replace the counter
            indices = [i] # This is your only index
        elif min == count: #If this least common (but there were more element with the same count)
            indices.append(i) #Add it to our indices counter

    tempList = []
    #You can possibly shorten the run time below
    for ind in indices:
        tempList.append(l[ind])
    rList = []
    for x in tempList: #Remove duplicates in the list
        if x not in rList:
            rList.append(x)
    return rList

print(codedlist(list1))
print(codedlist(list2))

输出量

['cry']
['cry', 'no', 'me']
def codedlist(list):
    dict = {}
    for item in list:
        dict[item]=list.count(item)
    most_common_number = max(dict.values())
    most_common = []
    for k,v in dict.items():
        if most_common_number == v:
            most_common.append(k)
    return most_common
list1 = ["cry", "me", "me", "no", "me", "no", "no", "cry", "me"] 
list2 = ["cry", "cry", "cry", "no", "no", "no", "me", "me", "me"] 

print(codedlist(list1))

可能是最简单,最快的方法来接收馆藏中最不常见的物品。

min(list1, key=list1.count)

实际上:

>>> data = ["cry", "me", "me", "no", "me", "no", "no", "cry", "me"]
>>> min(data, key=data.count)
'cry'

测试了速度与collections.Counter方法的对比,速度更快。 看到这个REPL

PS: max可以找到最常见的物品。

编辑

要获得多个最不常见的项目,您可以使用理解来扩展此方法。

>>> lc = data.count(min(data, key=data.count))
>>> {i for i in data if data.count(i) == lc}
{'no', 'me', 'cry'}

基本上,您想浏览一下列表,然后在每个元素中问自己:

“我以前看过这个元素吗?”

如果答案为是,则将该元素的计数加1;如果答案为否,则将其添加至可见值字典。 最后我们按值对它进行排序,然后选择第一个单词,因为它是最小的,让我们实现它:

import operator

words = ['blah','blah','car']
seen_dictionary = {}
for w in words:
    if w in seen_dic.keys():
        seen_dictionary[w] += 1 
    else:
        seen_dic.update({w : 1})

final_word = sorted(x.items(), key=operator.itemgetter(1))[0][0] #as the output will be 2D tuple sorted by the second element in each of smaller tuples.

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM