如何在 python 的文本文件中查找重复 5 次的单词？

Question

我试图编写一个从随机文本文件中选择单词的单词计数器程序。 条件是重复 5 次或更多次的单词将被写在屏幕上。 我尝试了很多例子。 我用这样的 if 条件尝试了 count() function：

如果 fullText.count(word) > 5:...

但它不起作用。

这是我的代码：

from tkinter import *
from tkinter import filedialog

main = Tk()
main.title(".TXT File Word Counter")
main.resizable(height=FALSE, width=FALSE)
main.geometry('500x400')
main.configure(bg='#757575')

labelfont = ("Arial", 14, "bold")

result = dict()

def clear_text():
    textfield.delete(0, END)
    ShowCountedWords.delete(1.0, END)

def open_file():
    main.filename = filedialog.askopenfilename()

def count_word(file):
    fileOpen = open(str(file), 'r')
    fullText = fileOpen.readlines()
    fileOpen.close()
    for word in textfield.get().split(', '):
        for text in fullText:
            if word in result:
                result[word] = result[word] + text.count(word)
            else:
                result[word] = text.count(word)
    ShowCountedWords.delete(1.0, END)
    for key, value in result.items():
        ShowCountedWords.insert('1.0', '{0} : {1} \n'.format(key, value))

    result.clear()


heading = Label(main, text=".TXT File Word Counter")
heading.place(x=150, y=2)
heading.config(bg="#757575", font=labelfont, fg="#ffffff")

textfield = Entry()
textfield.place(x=3, y=30)
textfield.config(width=81, borderwidth=2)

btnSelectFile = Button(main, text="Select .txt File", command=lambda : open_file())
btnSelectFile.place(x=4, y=60)
btnSelectFile.config(width=20, bg="#66BB6A")

btnCount = Button(main, text="Count Words", command=lambda : count_word(main.filename))
btnCount.place(x=173, y=60)
btnCount.config(width=20, bg="#42A5F5")

btnClear = Button(main, text="Clear", command=lambda : clear_text())
btnClear.place(x=344, y=60)
btnClear.config(width=20, bg="#ef5350")

ShowCountedWords = Text(main, height=18, width=61)
ShowCountedWords.place(x=4, y=100)
ShowCountedWords.config(bg="#616161", fg="#ffffff")


main.mainloop()

我应该怎么办？ （tkinter 在这个问题上并不重要。）

Answer 1

fullText.replace(",","")
fullText.replace(".","")
for word in textfield.get().split(' '):
   #your code

如果你用“，”分割字符串，大多数单词将被跳过

Answer 2

在“for text in fullText”循环中，您正在遍历打开的文件中的行列表，而不是要添加到字典中的单词列表。 您的逻辑是正确的，遍历列表，找到一个单词，如果它已经存在于字典中，则为其添加一个计数，如果不存在，则使其成为 count = 1 的新实例。但是，您需要将每一行从将行列表转换为单词的子列表，这些是您要迭代的单词。

Answer 3

您可以使用re模块查找字符串中的所有单词，然后使用collections.Counter()来计算单词的出现次数。

下面是一个例子：

import re
from collections import Counter

with open('sample.txt') as f:
    data = f.read()

# found all words
words = re.findall(r'[\w]+', data)
# count occurrence of words
result = Counter(words)
# show all words with occurrence >= 5
print([word for word, count in result.items() if count >= 5])

如何在 python 的文本文件中查找重复 5 次的单词？

问题描述

3 个解决方案

解决方案1
0 2022-01-09 16:27:51

解决方案2
0 2022-01-09 16:32:07

解决方案3
0 2022-01-10 02:00:22

如何在 python 的文本文件中查找重复 5 次的单词？

问题描述

3 个解决方案

解决方案1 0 2022-01-09 16:27:51

解决方案2 0 2022-01-09 16:32:07

解决方案3 0 2022-01-10 02:00:22

解决方案1
0 2022-01-09 16:27:51

解决方案2
0 2022-01-09 16:32:07

解决方案3
0 2022-01-10 02:00:22