簡體 English 中英

在Gensim中添加停用詞

[英]Add stop words in Gensim

原文 2019-03-19 19:08:35 3 2 python/ windows/ nlp/ gensim/ stop-words

感謝您的光臨！ 我有一個關於添加停用詞的快速問題。 我的數據集中顯示了一些單詞，但我希望可以將它們添加到gensims停止單詞列表中。 我已經看到了很多使用nltk的示例，我希望有一種方法可以在gensim中進行相同的操作。 我將在下面發布我的代碼：

 def preprocess(text): result = [] for token in gensim.utils.simple_preprocess(text): if token not in gensim.parsing.preprocessing.STOPWORDS and len(token) > 3: nltk.bigrams(token) result.append(lemmatize_stemming(token)) return result

2 個解決方案

為方便起見， gensim.parsing.preprocessing.STOPWORDS已預先定義，並且碰巧是frozenset因此無法直接添加到其中，但您可以輕松地創建一個更大的集，包括這些單詞和您的添加內容。 例如：

from gensim.parsing.preprocessing import STOPWORDS
my_stop_words = STOPWORDS.union(set(['mystopword1', 'mystopword2']))

然后在后續的停用詞刪除代碼中使用新的較大的my_stop_words 。 （ gensim的simple_preprocess()函數不會自動刪除停用詞。）

 def preprocess(text): result = [] for token in gensim.utils.simple_preprocess(text): newStopWords = ['stopword1','stopword2'] if token not in gensim.parsing.preprocessing.STOPWORDS and token not in newStopWords and len(token) > 3: nltk.bigrams(token) result.append(lemmatize_stemming(token)) return result

如何從gensim文件中刪除停用詞？

[英]How to remove stop words from documents in gensim?

WikiCorpus 會刪除 gensim 中的 stop_words 嗎？

[英]does WikiCorpus remove stop_words in gensim?

gensim函數預測輸出字

[英]gensim function predict output words

使用 spacy 添加/刪除自定義停用詞

[英]Add/remove custom stop words with spacy

使用映射重命名gensim Word2Vec單詞

[英]Rename gensim Word2Vec words with mapping

如何在Gensim中獲取上下文單詞列表

[英]How to get list of context words in Gensim

Gensim：培訓中單詞/令牌的丟失

[英]Gensim: Loss of Words/Tokens while Training

詞匯表中的單詞數 gensim word2vec

[英]Number of words in vocabulary gensim word2vec

Gensim在半徑為r的球中找到向量/單詞

[英]Gensim find vectors/words in ball of radius r

如何僅在gensim中訪問主題詞

[英]How to access topic words only in gensim

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 如何從gensim文件中刪除停用詞？ WikiCorpus 會刪除 gensim 中的 stop_words 嗎？ gensim函數預測輸出字使用 spacy 添加/刪除自定義停用詞使用映射重命名gensim Word2Vec單詞如何在Gensim中獲取上下文單詞列表 Gensim：培訓中單詞/令牌的丟失詞匯表中的單詞數 gensim word2vec Gensim在半徑為r的球中找到向量/單詞如何僅在gensim中訪問主題詞

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM