简体   繁体   中英

Topic modeling on short texts Python

I want to do topic modeling on short texts. I did some research on LDA and found that it doesn't go well with short texts. What methods would be better and do they have Python implementations?

You can try Short Text Topic Modelling (refer to this https://www.groundai.com/project/sttm-a-tool-for-short-text-topic-modeling/1 ) (code available at https://github.com/qiang2100/STTM ). It combine state-of-the-art algorithms and traditional topics modelling for long text which can conveniently be used for short text.

For more specialised libraries, try lda2vec-tf, which combines word vectors with LDA topic vectors. It is branched from the original lda2vec and improved upon and gives better results than the original library.

The only Python implementation of short text topic modeling is GSDMM . Unfortunately, most of the others are written on Java.

Besides GSDM , there is also biterm implemented in python for short text topic modeling.

Here's a very fast and easy to use implementation of GSDMM that can be used in Python: https://github.com/centre-for-humanities-computing/tweetopic

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM