简体   繁体   English

查找数据集中出现频率最高的单词

[英]Find the most frequent words that appear in the dataset

I write a function that takes as input a list and returns the most common item in the list.我写了一个 function 作为输入列表并返回列表中最常见的项目。

##Write the function
def most_frequent(List): 
    dict = {} 
    count, itm = 0, '' 
    for item in reversed(List): 
        dict[item] = dict.get(item, 0) + 1
        if dict[item] >= count : 
            count, itm = dict[item], item 
    return(item) 
  
    return num 

# verfiy the code 

list = [5,42,34,6,7,4,2,5]
print(most_frequent(list)) 

and then download two text file to get the most frequent words.然后下载两个文本文件以获取最常用的单词。

# Download the files restaurants.txt and restaurant-names.txt from Github
!curl https://raw.githubusercontent.com/ipeirotis/introduction-to-python/master/data/restaurant-names.txt -o restaurant-names.txt
!curl https://raw.githubusercontent.com/ipeirotis/introduction-to-python/master/data/restaurants.txt -o restaurants.txt



# create the list from the restaurants.txt
  List = open("restaurants.txt").readlines()

# get the most most frequent restaurant names
print("The most frequent restaurant names is ",most_frequent(List))

print(most_common(List))

but when i try to find the most frequent words that appear in the restaurant names.但是当我试图找到出现在餐厅名称中最常见的词时。 I got the same result.我得到了同样的结果。 Could you help to check whether this is correct or not?你能帮忙检查一下这是否正确吗? Thanks谢谢

 # create the list from the restaurants.txt
List = open("restaurants.txt").readlines()

# get the most most frequent restaurant names
print("The most frequent restaurant names is ",most_frequent(List))

It's return itm (most common item) instead of return item (last part of your reversed list)它是return itm (最常见的项目)而不是return item (你的反向列表的最后一部分)

It seems as though you might be using the wrong filename for the restauarant names file.似乎您可能为餐厅名称文件使用了错误的文件名。 Judging from your curl command:从您的 curl 命令来看:

:curl https.//raw.githubusercontent.com/ipeirotis/introduction-to-python/master/data/restaurant-names.txt -o restaurant-names.txt

The filename you should be using is restaurant-names.txt so your code should be:你应该使用的文件名是restaurant-names.txt所以你的代码应该是:

 # create the list from the restaurants.txt
List = open("restaurants-names.txt").readlines()

# get the most most frequent restaurant names
print("The most frequent restaurant names is ",most_frequent(List))

It might be the function that is wrong, what if you try the same test data but in a different order, for example: list = [42,5,34,6,5,7,4,2] instead of list = [5,42,34,6,7,4,2,5] , is the output still 5?可能是 function 出错了,如果您尝试相同的测试数据但顺序不同,例如: list = [42,5,34,6,5,7,4,2]而不是list = [5,42,34,6,7,4,2,5] ,output还是5吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM