繁体   English   中英

在忽略大小写敏感性的列表中查找最常见的字符串

[英]Finding the most frequent strings in a list neglecting case sentivity

我有一个名为li的 Twitter 主题标签列表。 我想从中创建一个新列表top_10最常见的主题标签。 到目前为止,我已经完成了( # ):

li = ['COVID19', 'Covid19', 'covid19', 'coronavirus', 'Coronavirus',...]
tag_counter = dict()
for tag in li:
    if tag in tag_counter:
         tag_counter[tag] += 1
    else:
         tag_counter[tag] = 1
 
popular_tags = sorted(tag_counter, key = tag_counter.get, reverse = True)

top_10 = popular_tags[:10]

print('\nList of the top 10 popular hashtags are :\n',top_10)

由于主题标签不区分大小写,我想在创建我的tag_counter时应用不区分大小写。

使用标准库中的collections.Counter

from collections import Counter

list_of_words = ['hello', 'hello', 'world']
lowercase_words = [w.lower() for w in list_of_words]

Counter(lowercase_words).most_common(1)

回报:

[('hello', 2)]

首先规范化数据,下限或上限。

li = ['COVID19', 'Covid19', 'covid19', 'coronavirus', 'Coronavirus']
li = [x.upper() for x in li] # OR, li = [x.lower() for x in li]
tag_counter = dict()
for tag in li:
    if tag in tag_counter:
         tag_counter[tag] += 1
    else:
         tag_counter[tag] = 1
 
popular_tags = sorted(tag_counter, key = tag_counter.get, reverse = True)

top_10 = popular_tags[:10]

print('\nList of the top 10 popular hashtags are :\n',top_10)

您可以使用 collections 库中的Counter

from collections import Counter

li = ['COVID19', 'Covid19', 'covid19', 'coronavirus', 'Coronavirus']

print(Counter([i.lower() for i in li]).most_common(10))

Output:

[('covid19', 3), ('coronavirus', 2)]

见下文

from collections import Counter

lst = ['Ab','aa','ab','Aa','Cct','aA']
lower_lst = [x.lower() for x in lst ]
counter = Counter(lower_lst)
print(counter.most_common(1))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM