简体   繁体   English

如何获取pyLDAvis中特定相关度量值(lambda)的每个主题的单词列表?

[英]How to get list of words for each topic for a specific relevance metric value (lambda) in pyLDAvis?

I am using pyLDAvis along with gensim.models.LdaMulticore for topic modeling.我使用 pyLDAvis 和 gensim.models.LdaMulticore 进行主题建模。 I have totally 10 topics.我总共有 10 个主题。 When I visualize the results using pyLDAvis, there is a bar called lambda with this explanation: "Slide to adjust relevance metric".当我使用 pyLDAvis 对结果进行可视化时,有一个名为 lambda 的栏,上面有这样的解释:“滑动以调整相关性指标”。 I am interested to extract the list of words for each topic separately for lambda = 0.1.我有兴趣为 lambda = 0.1 分别提取每个主题的单词列表。 I cannot find a way to adjust lambda in the document for extracting keywords.我找不到在文档中调整 lambda 以提取关键字的方法。

I am using these lines:我正在使用这些行:

if 1 == 1:
    LDAvis_prepared = pyLDAvis.gensim_models.prepare(lda_model, corpus, id2word, lambda_step=0.1)
LDAvis_prepared.topic_info

And these are the results:这些是结果:

   Term     Freq        Total       Category logprob loglift
321 ra      2336.000000 2336.000000 Default 30.0000 30.0000
146 may     1741.000000 1741.000000 Default 29.0000 29.0000
66  doctor  1310.000000 1310.000000 Default 28.0000 28.0000

First of all these results are not related to what I observe with lambda of 0.1 in visualization.首先,这些结果与我在可视化中观察到的 lambda 为 0.1 的结果无关。 Secondly I cannot see the results separated by the topics.其次,我看不到由主题分隔的结果。

You may want to read this github page: https://nicharuc.github.io/topic_modeling/你可能想阅读这个 github 页面: https : //nicharuc.github.io/topic_modeling/

According to this example, your code could go like this:根据这个例子,你的代码可能是这样的:

lambd = 0.6 # a specific relevance metric value

all_topics = {}
num_topics = lda_model.num_topics
num_terms = 10 

for i in range(1,num_topics): 
    topic = LDAvis_prepared.topic_info[LDAvis_prepared.topic_info.Category == 'Topic'+str(i)].copy()
    topic['relevance'] = topic['loglift']*(1-lambd)+topic['logprob']*lambd
    all_topics['Topic '+str(i)] = topic.sort_values(by='relevance', ascending=False).Term[:num_terms].values
pd.DataFrame(all_topics).T

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM