简体   繁体   中英

Extract Word Saliency from Gensim LDA or pyLDAvis

I see that pyLDAvis visualize each word's saliency under each topic.

在此处输入图片说明

But do we have a way to extract each word's saliency under each topic? Or how to calculate each word's saliency directly using Gensim LDA?

So finally, I want to get a pandas dataframe such that one row represents one word, each column represents each topic and its value represents the word's saliency under the corresponding topic.

Many thanks in advance.

Gensim's LDA support does not have out-of-the-box support for this particular 'saliency' calculation from Chuang et al (2012).

Still, I suspect the model's .get_term_topics() and/or .get_topic_terms() methods are the proper supporting data for implementing that calculation. In particular, one or the other of those methods might provide the p( w | t ) term, but a deeper read of the paper would be required to know for sure. (I suspect the P(t) term might require a separate survey of the training data.)

From the class docs:

https://radimrehurek.com/gensim/models/ldamodel.html#gensim.models.ldamodel.LdaModel.get_term_topics

Returns The relevant topics represented as pairs of their ID and their assigned probability, sorted by relevance to the given word.

https://radimrehurek.com/gensim/models/ldamodel.html#gensim.models.ldamodel.LdaModel.get_topic_terms

Returns Word ID - probability pairs for the most relevant words generated by the topic.

I hadn't come across this particular 'saliency' calculation before, but if it is popular among LDA users, or of potential general use, and you figure out how to calculate it, it'd likely be a welcome contribution to the Gensim project - especially if it can be a simple extra convenience method on LdaModel .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM