[英]How to find words repeated 5 times in a text file in python?
我试图编写一个从随机文本文件中选择单词的单词计数器程序。 条件是重复 5 次或更多次的单词将被写在屏幕上。 我尝试了很多例子。 我用这样的 if 条件尝试了 count() function:
如果 fullText.count(word) > 5:...
但它不起作用。
这是我的代码:
from tkinter import *
from tkinter import filedialog
main = Tk()
main.title(".TXT File Word Counter")
main.resizable(height=FALSE, width=FALSE)
main.geometry('500x400')
main.configure(bg='#757575')
labelfont = ("Arial", 14, "bold")
result = dict()
def clear_text():
textfield.delete(0, END)
ShowCountedWords.delete(1.0, END)
def open_file():
main.filename = filedialog.askopenfilename()
def count_word(file):
fileOpen = open(str(file), 'r')
fullText = fileOpen.readlines()
fileOpen.close()
for word in textfield.get().split(', '):
for text in fullText:
if word in result:
result[word] = result[word] + text.count(word)
else:
result[word] = text.count(word)
ShowCountedWords.delete(1.0, END)
for key, value in result.items():
ShowCountedWords.insert('1.0', '{0} : {1} \n'.format(key, value))
result.clear()
heading = Label(main, text=".TXT File Word Counter")
heading.place(x=150, y=2)
heading.config(bg="#757575", font=labelfont, fg="#ffffff")
textfield = Entry()
textfield.place(x=3, y=30)
textfield.config(width=81, borderwidth=2)
btnSelectFile = Button(main, text="Select .txt File", command=lambda : open_file())
btnSelectFile.place(x=4, y=60)
btnSelectFile.config(width=20, bg="#66BB6A")
btnCount = Button(main, text="Count Words", command=lambda : count_word(main.filename))
btnCount.place(x=173, y=60)
btnCount.config(width=20, bg="#42A5F5")
btnClear = Button(main, text="Clear", command=lambda : clear_text())
btnClear.place(x=344, y=60)
btnClear.config(width=20, bg="#ef5350")
ShowCountedWords = Text(main, height=18, width=61)
ShowCountedWords.place(x=4, y=100)
ShowCountedWords.config(bg="#616161", fg="#ffffff")
main.mainloop()
我应该怎么办? (tkinter 在这个问题上并不重要。)
fullText.replace(",","")
fullText.replace(".","")
for word in textfield.get().split(' '):
#your code
如果你用“,”分割字符串,大多数单词将被跳过
在“for text in fullText”循环中,您正在遍历打开的文件中的行列表,而不是要添加到字典中的单词列表。 您的逻辑是正确的,遍历列表,找到一个单词,如果它已经存在于字典中,则为其添加一个计数,如果不存在,则使其成为 count = 1 的新实例。但是,您需要将每一行从将行列表转换为单词的子列表,这些是您要迭代的单词。
您可以使用re
模块查找字符串中的所有单词,然后使用collections.Counter()
来计算单词的出现次数。
下面是一个例子:
import re
from collections import Counter
with open('sample.txt') as f:
data = f.read()
# found all words
words = re.findall(r'[\w]+', data)
# count occurrence of words
result = Counter(words)
# show all words with occurrence >= 5
print([word for word, count in result.items() if count >= 5])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.