简体   繁体   English

在不使用 nltk 的情况下删除文本文件中的停用词

[英]remove stop words in a text file without using nltk

hi everybody i want to remove stop words in a text file without using nltk.大家好,我想在不使用 nltk 的情况下删除文本文件中的停用词。 I have a text file has stop words list for stopping, i want use the stop words list mentioned above.我有一个文本文件,其中包含用于停止的停用词列表,我想使用上面提到的停用词列表。 thank you谢谢你

Although hard to understand the exact requirements, I would do something as follows:虽然很难理解确切的要求,但我会做以下事情:

with open("stopwords.txt") as f:
    stopwords = f.read().splitlines() # Contains "and" and "or" on different lines

text = "Foo and bar or foo"
tokens = text.split() # Split into list of words
for word in tokens: 
    if word.lower() in stopwords: # If word in stopwords remove it
        tokens.remove(word)
clean_text = " ".join(word for word in tokens) # Join words into a string
print(clean_text) # Outputs: "Foo bar foo"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM