在忽略大小写敏感性的列表中查找最常见的字符串

Question

我有一个名为li的 Twitter 主题标签列表。 我想从中创建一个新列表top_10最常见的主题标签。 到目前为止，我已经完成了（ # ）：

li = ['COVID19', 'Covid19', 'covid19', 'coronavirus', 'Coronavirus',...]
tag_counter = dict()
for tag in li:
    if tag in tag_counter:
         tag_counter[tag] += 1
    else:
         tag_counter[tag] = 1
 
popular_tags = sorted(tag_counter, key = tag_counter.get, reverse = True)

top_10 = popular_tags[:10]

print('\nList of the top 10 popular hashtags are :\n',top_10)

由于主题标签不区分大小写，我想在创建我的tag_counter时应用不区分大小写。

Answer 1

使用标准库中的collections.Counter

from collections import Counter

list_of_words = ['hello', 'hello', 'world']
lowercase_words = [w.lower() for w in list_of_words]

Counter(lowercase_words).most_common(1)

回报：

[('hello', 2)]

Answer 2

首先规范化数据，下限或上限。

li = ['COVID19', 'Covid19', 'covid19', 'coronavirus', 'Coronavirus']
li = [x.upper() for x in li] # OR, li = [x.lower() for x in li]
tag_counter = dict()
for tag in li:
    if tag in tag_counter:
         tag_counter[tag] += 1
    else:
         tag_counter[tag] = 1
 
popular_tags = sorted(tag_counter, key = tag_counter.get, reverse = True)

top_10 = popular_tags[:10]

print('\nList of the top 10 popular hashtags are :\n',top_10)

Answer 3

您可以使用 collections 库中的Counter

from collections import Counter

li = ['COVID19', 'Covid19', 'covid19', 'coronavirus', 'Coronavirus']

print(Counter([i.lower() for i in li]).most_common(10))

Output：

[('covid19', 3), ('coronavirus', 2)]

Answer 4

见下文

from collections import Counter

lst = ['Ab','aa','ab','Aa','Cct','aA']
lower_lst = [x.lower() for x in lst ]
counter = Counter(lower_lst)
print(counter.most_common(1))

在忽略大小写敏感性的列表中查找最常见的字符串

问题描述

4 个解决方案

解决方案1
2 2020-08-15 09:23:19

解决方案2
1 已采纳 2020-08-15 09:23:37

解决方案3
1 2020-08-15 09:24:05

解决方案4
1 2020-08-15 09:25:58

在忽略大小写敏感性的列表中查找最常见的字符串

问题描述

4 个解决方案

解决方案1 2 2020-08-15 09:23:19

解决方案2 1 已采纳 2020-08-15 09:23:37

解决方案3 1 2020-08-15 09:24:05

解决方案4 1 2020-08-15 09:25:58

解决方案1
2 2020-08-15 09:23:19

解决方案2
1 已采纳 2020-08-15 09:23:37

解决方案3
1 2020-08-15 09:24:05

解决方案4
1 2020-08-15 09:25:58