简体   繁体   English

在不使用 NLTK 的情况下从 Python 中的文本中删除停止词

[英]Removing Stop Word From a Text in Python Without Using NLTK

I made a list of stopwords in my native language in Python.我用我的母语在 Python 中列出了停用词。 How can I remove them without using NLTK when I type a text?键入文本时如何在不使用 NLTK 的情况下删除它们?

Check this out (This only works if the language in question can be broken on spaces, but that hasn't been clarified – Thanks to Oso):看看这个(这只有在有问题的语言可以在空格上被打破时才有效,但这还没有得到澄清——感谢 Oso):

import numpy as np
your_stop_words = ['something','sth_else','and ...']
new_string = input()
words = np.array(new_string.split())
is_stop_word = np.isin(words,your_stop_words)
filtered_words = words[~is_stop_word]
clean_text = ' '.join(filtered_words)

If the language in question can not be broken to spaces, you can use this solution:如果有问题的语言不能被分成空格,你可以使用这个解决方案:

your_stop_words = ['something','sth_else','and ...']
new_string = input()
clean_text = new_string
for stop_word in your_stop_words :
    clean_text = clean_text.replace(stop_word,"")

In this case, you need to ensure that a stop word can not be a part of another word.在这种情况下,您需要确保停用词不能成为另一个词的一部分。 you can do it based on your language.你可以根据你的语言来做。 for example you can use spaces around your stop words.例如,您可以在停用词周围使用空格。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM