Add stop words in Gensim

Question

Thanks for stopping by! I had a quick question about appending stop words. I have a select few words that show up in my data set and I was hopping I could add them to gensims stop word list. I've seen a lot of examples using nltk and I was hoping there would be a way to do the same in gensim. I'll post my code below:

 def preprocess(text): result = [] for token in gensim.utils.simple_preprocess(text): if token not in gensim.parsing.preprocessing.STOPWORDS and len(token) > 3: nltk.bigrams(token) result.append(lemmatize_stemming(token)) return result

Answer 1

While gensim.parsing.preprocessing.STOPWORDS is pre-defined for your convenience, and happens to be a frozenset so it can't be directly added-to, you could easily make a larger set that includes both those words and your additions. For example:

from gensim.parsing.preprocessing import STOPWORDS
my_stop_words = STOPWORDS.union(set(['mystopword1', 'mystopword2']))

Then use the new, larger my_stop_words in your subsequent stop-word-removal code. (The simple_preprocess() function of gensim doesn't automatically remove stop-words.)

Answer 2

 def preprocess(text): result = [] for token in gensim.utils.simple_preprocess(text): newStopWords = ['stopword1','stopword2'] if token not in gensim.parsing.preprocessing.STOPWORDS and token not in newStopWords and len(token) > 3: nltk.bigrams(token) result.append(lemmatize_stemming(token)) return result

Add stop words in Gensim

Question

2 answers

solution1
1 ACCPTED 2019-03-19 20:54:04

solution2
0 2019-03-19 21:15:52

Add stop words in Gensim

Question

2 answers

solution1 1 ACCPTED 2019-03-19 20:54:04

solution2 0 2019-03-19 21:15:52

solution1
1 ACCPTED 2019-03-19 20:54:04

solution2
0 2019-03-19 21:15:52