繁体 English 中英

在Gensim中添加停用词

[英]Add stop words in Gensim

原文 2019-03-19 19:08:35 8 2 python/ windows/ nlp/ gensim/ stop-words

感谢您的光临！ 我有一个关于添加停用词的快速问题。 我的数据集中显示了一些单词，但我希望可以将它们添加到gensims停止单词列表中。 我已经看到了很多使用nltk的示例，我希望有一种方法可以在gensim中进行相同的操作。 我将在下面发布我的代码：

 def preprocess(text): result = [] for token in gensim.utils.simple_preprocess(text): if token not in gensim.parsing.preprocessing.STOPWORDS and len(token) > 3: nltk.bigrams(token) result.append(lemmatize_stemming(token)) return result

2 个解决方案

为方便起见， gensim.parsing.preprocessing.STOPWORDS已预先定义，并且碰巧是frozenset因此无法直接添加到其中，但您可以轻松地创建一个更大的集，包括这些单词和您的添加内容。 例如：

from gensim.parsing.preprocessing import STOPWORDS
my_stop_words = STOPWORDS.union(set(['mystopword1', 'mystopword2']))

然后在后续的停用词删除代码中使用新的较大的my_stop_words 。 （ gensim的simple_preprocess()函数不会自动删除停用词。）

 def preprocess(text): result = [] for token in gensim.utils.simple_preprocess(text): newStopWords = ['stopword1','stopword2'] if token not in gensim.parsing.preprocessing.STOPWORDS and token not in newStopWords and len(token) > 3: nltk.bigrams(token) result.append(lemmatize_stemming(token)) return result

如何从gensim文件中删除停用词？

[英]How to remove stop words from documents in gensim?

WikiCorpus 会删除 gensim 中的 stop_words 吗？

[英]does WikiCorpus remove stop_words in gensim?

gensim函数预测输出字

[英]gensim function predict output words

使用 spacy 添加/删除自定义停用词

[英]Add/remove custom stop words with spacy

使用映射重命名gensim Word2Vec单词

[英]Rename gensim Word2Vec words with mapping

如何在Gensim中获取上下文单词列表

[英]How to get list of context words in Gensim

Gensim：培训中单词/令牌的丢失

[英]Gensim: Loss of Words/Tokens while Training

词汇表中的单词数 gensim word2vec

[英]Number of words in vocabulary gensim word2vec

Gensim在半径为r的球中找到向量/单词

[英]Gensim find vectors/words in ball of radius r

如何仅在gensim中访问主题词

[英]How to access topic words only in gensim

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从gensim文件中删除停用词？ WikiCorpus 会删除 gensim 中的 stop_words 吗？ gensim函数预测输出字使用 spacy 添加/删除自定义停用词使用映射重命名gensim Word2Vec单词如何在Gensim中获取上下文单词列表 Gensim：培训中单词/令牌的丢失词汇表中的单词数 gensim word2vec Gensim在半径为r的球中找到向量/单词如何仅在gensim中访问主题词

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM