[英]Calculating source frequency in Python
我是 python 的新手; 我正在尋找計算源頻率。 我有文件(來源在標記中),我想找到所有來源中顯示的單詞來計算。 例如,顯示來源的單詞“beautiful”,結果單詞“beautiful”在 5 個來源中。 我已經有 python 代碼來查找一個單詞,但我需要從文件中查找所有單詞,我應該如何更改代碼任何想法?
from os import listdir
with open("C:/Users/elle/Desktop/Archivess/test/rez.txt", "w") as f:
for filename in listdir("C:/Users/elle/Desktop/Archivess/test/sources/books/"):
with open('C:/Users/elle/Desktop/Archivess/test/freqs/books/' + filename) as currentFile:
text = currentFile.read()
if ('beautiful' in text):
f.write('The word excist in the file ' + filename[:-4] + '\n')
else:
f.write('The word doen't excist in the file' + filename[:-4] + '\n')
我將感謝您的任何幫助,謝謝!
如前所述,您需要轉義'
字符。 逃避它的方法是放'\'
。 喜歡doen\'t
from os import listdir
with open("C:/Users/elle/Desktop/Archivess/test/rez.txt", "w") as f:
for filename in listdir("C:/Users/elle/Desktop/Archivess/test/sources/books/"):
with open('C:/Users/elle/Desktop/Archivess/test/freqs/books/' + filename) as currentFile:
text = currentFile.read()
text = text.strip().lower()
text = text.replace(".", "").replace(",", "").replace("\"", "").replace("'", "") # replace all .,"'
words = text.split(" ") # split the text
unique_words = set(words)
count_dict = {}
for each_word in words:
if(each_word in count_dict):
count_dict[each_word] += 1
else:
count_dict[each_word] = 1
for k in count_dict:
f.write('The word' + k +'excist in the file ' + filename[:-4] + ' for ' + str(count_dict[k]) + ' number of times' '\n')
# if ('beautiful' in text):
# f.write('The word excist in the file ' + filename[:-4] + '\n')
# else:
# f.write('The word doen\'t excist in the file' + filename[:-4] + '\n')
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.