在不使用 NLTK 的情况下从 Python 中的文本中删除停止词

Question

I made a list of stopwords in my native language in Python.我用我的母语在 Python 中列出了停用词。 How can I remove them without using NLTK when I type a text?键入文本时如何在不使用 NLTK 的情况下删除它们？

Answer 1

Check this out (This only works if the language in question can be broken on spaces, but that hasn't been clarified – Thanks to Oso):看看这个（这只有在有问题的语言可以在空格上被打破时才有效，但这还没有得到澄清——感谢 Oso）：

import numpy as np
your_stop_words = ['something','sth_else','and ...']
new_string = input()
words = np.array(new_string.split())
is_stop_word = np.isin(words,your_stop_words)
filtered_words = words[~is_stop_word]
clean_text = ' '.join(filtered_words)

If the language in question can not be broken to spaces, you can use this solution:如果有问题的语言不能被分成空格，你可以使用这个解决方案：

your_stop_words = ['something','sth_else','and ...']
new_string = input()
clean_text = new_string
for stop_word in your_stop_words :
    clean_text = clean_text.replace(stop_word,"")

In this case, you need to ensure that a stop word can not be a part of another word.在这种情况下，您需要确保停用词不能成为另一个词的一部分。 you can do it based on your language.你可以根据你的语言来做。 for example you can use spaces around your stop words.例如，您可以在停用词周围使用空格。

在不使用 NLTK 的情况下从 Python 中的文本中删除停止词

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-01-22 20:15:50

在不使用 NLTK 的情况下从 Python 中的文本中删除停止词

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-01-22 20:15:50

解决方案1
0 已采纳 2021-01-22 20:15:50