简体   繁体   中英

Removing Stop Word From a Text in Python Without Using NLTK

I made a list of stopwords in my native language in Python. How can I remove them without using NLTK when I type a text?

Check this out (This only works if the language in question can be broken on spaces, but that hasn't been clarified – Thanks to Oso):

import numpy as np
your_stop_words = ['something','sth_else','and ...']
new_string = input()
words = np.array(new_string.split())
is_stop_word = np.isin(words,your_stop_words)
filtered_words = words[~is_stop_word]
clean_text = ' '.join(filtered_words)

If the language in question can not be broken to spaces, you can use this solution:

your_stop_words = ['something','sth_else','and ...']
new_string = input()
clean_text = new_string
for stop_word in your_stop_words :
    clean_text = clean_text.replace(stop_word,"")

In this case, you need to ensure that a stop word can not be a part of another word. you can do it based on your language. for example you can use spaces around your stop words.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM