简体繁体中英

Topic Modeling Using Gensim in Python

原文 2014-12-05 03:10:43 4 1 python/ machine-learning/ nlp/ lda/ gensim

I have a list of bag of words for two classes. Say n items in class A and m items in class B . I want to use the topic modeling with gensim package (for LDA) in python in order to train a model for class A vs class B. Meanwhile I am new to both Topic Modeling and Python . Does anyone know how should I do this? I mean, should I merge all the bags for each class and the use gensim or should I use bag for each item seperately? Thanks!

1 answers

If I understand you correctly you want to compare documents from two sources.

One way to do this with Gensim would be:

create bag of words corpus from all documents (A and B) (~convert texts to an X n matrix of ones and zeroes)
train LDA model on your corpus (~ find the topics)
convert corpus to LDA space (~ determine which topics are relevant for the documents)

Now you can see topics distributions for each documents and determine how similar two documents are using Gensim's similarity methods.

For details take a look at Gensim's tutorials . The only modification you'd need to make would be to combine your documents from A and B into one bigger document and save the indices somewhere so that you can compare them easily later.

However, depending on your data and your goal, other forms of LDA (such as correlated topics models) may be more suitable.

python IndexError using gensim for LDA Topic Modeling

Gensim Topic Modeling with Mallet Perplexity

Topic modeling on the Devanagari (Hindi) text using Python

LDA for Topic Modeling in Python

Strange output for topic modeling in python using PAM LDA

Topic modeling on short texts Python

Structural Topic Modeling (or alternatives) for python

Topic Modeling Memory Error: How to do gensim topic modelling when with large amounts of data

Extract Topic Scores for Documents LDA Gensim Python

Access dictionary in Python gensim topic model

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question python IndexError using gensim for LDA Topic Modeling Gensim Topic Modeling with Mallet Perplexity Topic modeling on the Devanagari (Hindi) text using Python LDA for Topic Modeling in Python Strange output for topic modeling in python using PAM LDA Topic modeling on short texts Python Structural Topic Modeling (or alternatives) for python Topic Modeling Memory Error: How to do gensim topic modelling when with large amounts of data Extract Topic Scores for Documents LDA Gensim Python Access dictionary in Python gensim topic model

Related Tags

Topic Modeling Using Gensim in Python

Question

1 answers

solution1 1 ACCPTED 2014-12-05 17:06:21

solution1
1 ACCPTED 2014-12-05 17:06:21