[英]Find the most frequent words that appear in the dataset
我寫了一個 function 作為輸入列表並返回列表中最常見的項目。
##Write the function
def most_frequent(List):
dict = {}
count, itm = 0, ''
for item in reversed(List):
dict[item] = dict.get(item, 0) + 1
if dict[item] >= count :
count, itm = dict[item], item
return(item)
return num
# verfiy the code
list = [5,42,34,6,7,4,2,5]
print(most_frequent(list))
然后下載兩個文本文件以獲取最常用的單詞。
# Download the files restaurants.txt and restaurant-names.txt from Github
!curl https://raw.githubusercontent.com/ipeirotis/introduction-to-python/master/data/restaurant-names.txt -o restaurant-names.txt
!curl https://raw.githubusercontent.com/ipeirotis/introduction-to-python/master/data/restaurants.txt -o restaurants.txt
# create the list from the restaurants.txt
List = open("restaurants.txt").readlines()
# get the most most frequent restaurant names
print("The most frequent restaurant names is ",most_frequent(List))
print(most_common(List))
但是當我試圖找到出現在餐廳名稱中最常見的詞時。 我得到了同樣的結果。 你能幫忙檢查一下這是否正確嗎? 謝謝
# create the list from the restaurants.txt
List = open("restaurants.txt").readlines()
# get the most most frequent restaurant names
print("The most frequent restaurant names is ",most_frequent(List))
它是return itm
(最常見的項目)而不是return item
(你的反向列表的最后一部分)
似乎您可能為餐廳名稱文件使用了錯誤的文件名。 從您的 curl 命令來看:
:curl https.//raw.githubusercontent.com/ipeirotis/introduction-to-python/master/data/restaurant-names.txt -o restaurant-names.txt
你應該使用的文件名是restaurant-names.txt
所以你的代碼應該是:
# create the list from the restaurants.txt
List = open("restaurants-names.txt").readlines()
# get the most most frequent restaurant names
print("The most frequent restaurant names is ",most_frequent(List))
可能是 function 出錯了,如果您嘗試相同的測試數據但順序不同,例如: list = [42,5,34,6,5,7,4,2]
而不是list = [5,42,34,6,7,4,2,5]
,output還是5嗎?
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.