How to get list of words for each topic for a specific relevance metric value (lambda) in pyLDAvis?

Question

I am using pyLDAvis along with gensim.models.LdaMulticore for topic modeling. I have totally 10 topics. When I visualize the results using pyLDAvis, there is a bar called lambda with this explanation: "Slide to adjust relevance metric". I am interested to extract the list of words for each topic separately for lambda = 0.1. I cannot find a way to adjust lambda in the document for extracting keywords.

I am using these lines:

if 1 == 1:
    LDAvis_prepared = pyLDAvis.gensim_models.prepare(lda_model, corpus, id2word, lambda_step=0.1)
LDAvis_prepared.topic_info

And these are the results:

   Term     Freq        Total       Category logprob loglift
321 ra      2336.000000 2336.000000 Default 30.0000 30.0000
146 may     1741.000000 1741.000000 Default 29.0000 29.0000
66  doctor  1310.000000 1310.000000 Default 28.0000 28.0000

First of all these results are not related to what I observe with lambda of 0.1 in visualization. Secondly I cannot see the results separated by the topics.

Answer 1

You may want to read this github page: https://nicharuc.github.io/topic_modeling/

According to this example, your code could go like this:

lambd = 0.6 # a specific relevance metric value

all_topics = {}
num_topics = lda_model.num_topics
num_terms = 10 

for i in range(1,num_topics): 
    topic = LDAvis_prepared.topic_info[LDAvis_prepared.topic_info.Category == 'Topic'+str(i)].copy()
    topic['relevance'] = topic['loglift']*(1-lambd)+topic['logprob']*lambd
    all_topics['Topic '+str(i)] = topic.sort_values(by='relevance', ascending=False).Term[:num_terms].values
pd.DataFrame(all_topics).T

How to get list of words for each topic for a specific relevance metric value (lambda) in pyLDAvis?

Question

1 answers

solution1
0 2021-11-24 10:43:51

How to get list of words for each topic for a specific relevance metric value (lambda) in pyLDAvis?

Question

1 answers

solution1 0 2021-11-24 10:43:51

solution1
0 2021-11-24 10:43:51