[英]Find the most frequent words that appear in the dataset
I write a function that takes as input a list and returns the most common item in the list.我写了一个 function 作为输入列表并返回列表中最常见的项目。
##Write the function
def most_frequent(List):
dict = {}
count, itm = 0, ''
for item in reversed(List):
dict[item] = dict.get(item, 0) + 1
if dict[item] >= count :
count, itm = dict[item], item
return(item)
return num
# verfiy the code
list = [5,42,34,6,7,4,2,5]
print(most_frequent(list))
and then download two text file to get the most frequent words.然后下载两个文本文件以获取最常用的单词。
# Download the files restaurants.txt and restaurant-names.txt from Github
!curl https://raw.githubusercontent.com/ipeirotis/introduction-to-python/master/data/restaurant-names.txt -o restaurant-names.txt
!curl https://raw.githubusercontent.com/ipeirotis/introduction-to-python/master/data/restaurants.txt -o restaurants.txt
# create the list from the restaurants.txt
List = open("restaurants.txt").readlines()
# get the most most frequent restaurant names
print("The most frequent restaurant names is ",most_frequent(List))
print(most_common(List))
but when i try to find the most frequent words that appear in the restaurant names.但是当我试图找到出现在餐厅名称中最常见的词时。 I got the same result.我得到了同样的结果。 Could you help to check whether this is correct or not?你能帮忙检查一下这是否正确吗? Thanks谢谢
# create the list from the restaurants.txt
List = open("restaurants.txt").readlines()
# get the most most frequent restaurant names
print("The most frequent restaurant names is ",most_frequent(List))
It's return itm
(most common item) instead of return item
(last part of your reversed list)它是return itm
(最常见的项目)而不是return item
(你的反向列表的最后一部分)
It seems as though you might be using the wrong filename for the restauarant names file.似乎您可能为餐厅名称文件使用了错误的文件名。 Judging from your curl command:从您的 curl 命令来看:
:curl https.//raw.githubusercontent.com/ipeirotis/introduction-to-python/master/data/restaurant-names.txt -o restaurant-names.txt
The filename you should be using is restaurant-names.txt
so your code should be:你应该使用的文件名是restaurant-names.txt
所以你的代码应该是:
# create the list from the restaurants.txt
List = open("restaurants-names.txt").readlines()
# get the most most frequent restaurant names
print("The most frequent restaurant names is ",most_frequent(List))
It might be the function that is wrong, what if you try the same test data but in a different order, for example: list = [42,5,34,6,5,7,4,2]
instead of list = [5,42,34,6,7,4,2,5]
, is the output still 5?可能是 function 出错了,如果您尝试相同的测试数据但顺序不同,例如: list = [42,5,34,6,5,7,4,2]
而不是list = [5,42,34,6,7,4,2,5]
,output还是5吗?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.