[英]How to filter out words in a corpus from a constrained vocabulary with gensim?
I am using gensim for topic modeling. 我正在使用gensim进行主题建模。 I've created a corpus using
我创建了一个语料库
wordDict = corpora.Dictionary(trimmedTextTokens)
gsCorpus = [wordDict.doc2bow(text) for text in trimmedTextTokens]
where trimmedTextTokens are the result of removing stop words. 其中trimmedTextTokens是删除停用词的结果。 Now I want to filter out the terms from the corpus that are not in a list of a restricted or constructed vocabulary.
现在,我想从语料库中筛选出不在受限制或构造的词汇表中的术语。 Any ideas?
有任何想法吗? Thank you!!
谢谢!!
Assuming your restricted vocabulary list is in a variable named restrictedVocabularyList
you could do: 假设您的限制词汇表位于名为
restrictedVocabularyList
的变量中,则可以执行以下操作:
wordDict = corpora.Dictionary(trimmedTextTokens)
gsCorpus = [wordDict.doc2bow(text) for text in trimmedTextTokens if text in restrictedVocabularyList]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.