[英]remove stop words in a text file without using nltk
hi everybody i want to remove stop words in a text file without using nltk.大家好,我想在不使用 nltk 的情况下删除文本文件中的停用词。 I have a text file has stop words list for stopping, i want use the stop words list mentioned above.我有一个文本文件,其中包含用于停止的停用词列表,我想使用上面提到的停用词列表。 thank you谢谢你
Although hard to understand the exact requirements, I would do something as follows:虽然很难理解确切的要求,但我会做以下事情:
with open("stopwords.txt") as f:
stopwords = f.read().splitlines() # Contains "and" and "or" on different lines
text = "Foo and bar or foo"
tokens = text.split() # Split into list of words
for word in tokens:
if word.lower() in stopwords: # If word in stopwords remove it
tokens.remove(word)
clean_text = " ".join(word for word in tokens) # Join words into a string
print(clean_text) # Outputs: "Foo bar foo"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.