[英]Topic Modeling Using Gensim in Python
I have a list of bag of words for two classes. 我有两节课的单词表。 Say n items in class A and m items in class B .
说出A类中的 n个项目和B类中的 m个项目。 I want to use the topic modeling with gensim package (for LDA) in python in order to train a model for class A vs class B. Meanwhile I am new to both Topic Modeling and Python .
我想在python中使用带有gensim包(适用于LDA)的主题建模,以便为A类与B类进行训练。同时,我对Topic Modeling和Python还是陌生的 。 Does anyone know how should I do this?
有人知道我该怎么做吗? I mean, should I merge all the bags for each class and the use gensim or should I use bag for each item seperately?
我的意思是,我应该合并每个班级和使用gensim的所有袋子吗?还是应该分别为每个项目使用袋子? Thanks!
谢谢!
If I understand you correctly you want to compare documents from two sources. 如果我对您的理解正确,则希望比较两个来源的文档。
One way to do this with Gensim would be: 用Gensim做到这一点的一种方法是:
Now you can see topics distributions for each documents and determine how similar two documents are using Gensim's similarity methods. 现在,您可以查看每个文档的主题分布,并使用Gensim的相似度方法确定两个文档的相似度。
For details take a look at Gensim's tutorials . 有关详细信息,请参阅Gensim的教程 。 The only modification you'd need to make would be to combine your documents from A and B into one bigger document and save the indices somewhere so that you can compare them easily later.
您唯一需要做的修改就是将A和B中的文档合并为一个更大的文档,并将索引保存在某个位置,以便以后可以轻松比较它们。
However, depending on your data and your goal, other forms of LDA (such as correlated topics models) may be more suitable. 但是,根据您的数据和目标,其他形式的LDA(例如相关主题模型)可能更合适。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.