[英]Removing stopwords and other tasks on Python
So, I have been given a .txt file (name:newtext) containing a novel and a .txt file (name:stopwords) containing a list of stopwords and I have to work on these two (without importing any other processing tools such as NLTK etc.) and I need to perform these tasks:因此,我得到了一个包含小说的.txt文件(名称:newtext)和一个包含停用词列表的 .txt 文件(名称:stopwords),我必须处理这两个文件(无需导入任何其他处理工具,例如NLTK 等),我需要执行以下任务:
I am really lost.我真的迷路了。
I'll give you some hints:我会给你一些提示:
List item项目清单
for word in new_words.split(" "):
if not word_count.get(word, False):
word_count[word] = 1
else:
word_count[word] += 1
for word in word_count.keys():
print(f"Number of occurances of {word} was {word_count[word]}.")
Any how, I thought of adding an answer for this.无论如何,我想为此添加一个答案。 Its a bare logic, which I did not test.
这是一个简单的逻辑,我没有测试。 Hopefully it will become handy!
希望它会变得方便!
novel = open("newtext.txt", "r").read()
s_words = open("stopwords.txt", "r").readlines()
s_words = [x.strip() for x in s_words]
# identify all words in the novel
all_words = novel.split(" ")
# remove stop words using the list of “stop words”
no_stop_words = [x for x in all_words if x not in s_words]
# determine frequencies of occurrences for each word after the stop-word removal
frequencies = {word: no_stop_words.count(word) for word in no_stop_words}
# Print out the top ten most frequent ones, together with their frequency counts
for word, frequency in sorted(frequencies.items(), key=lambda x: x[1], reverse=True)[:10]:
print(word, frequency)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.