简体   繁体   English

如何在gensim LDA中获取给定单词的主题词概率?

[英]How to get the topic-word probabilities of a given word in gensim LDA?

As I understand, if i'm training a LDA model over a corpus where the size of the dictionary is say 1000 and no of topics (K) = 10, for each word in the dictionary I should have a vector of size 10 where each position in the vector is the probability of that word belongs to that particular topic, right? 据我了解,如果我正在一个语料库上训练LDA模型,其中字典的大小是1000,主题(K)= 10,则字典中的每个单词我都应该有一个大小为10的向量向量中的位置是该单词属于该特定主题的概率,对吗?

So my question is given a word, what is the probability of that word belongs to to topic k where k could be from 1 to 10, how do I get this value in the gensim lda model? 因此,我的问题是一个单词,该单词属于主题k的概率是多少,其中k可能为1到10,如何在gensim lda模型中获得该值?

I was using get_term_topics method but it doesn't output all the probabilities for all the topics. 我使用的是get_term_topics方法,但没有输出所有主题的所有概率。 For eg., 例如

lda_model1.get_term_topics("fun")
[(12, 0.047421702085626238)],

but I want to see what is the prob that "fun" could be in all the other topics as well? 但我想看看在所有其他主题中“乐趣”也可能是什么?

For someone who is looking for the ans, i found it. 对于正在寻找ans的人,我找到了它。

These prob values are in the xx.expElogbeta numpy array. 这些概率值位于xx.expElogbeta numpy数组中。 Number of rows in this matrix is equivalent to the number of topics and the no of columns is the size of your dictionary (words). 此矩阵中的行数等于主题数,列数是字典(单词)的大小。 So if you get the values for a particular column, you get the prob of that word belonging to all the topics. 因此,如果获得特定列的值,则将获得属于所有主题的该单词的概率。

eg, 例如,

>>> data = np.load("model.expElogbeta.npy")
>>> data.shape
(20, 6481) # i have trained with 20 topics == no of rows
>>> dict = corpora.Dictionary.load(dictf)
>>> len(dict.keys())
6481 #columns of the npy array is the words in my dict

src = https://groups.google.com/forum/?fromgroups=#!searchin/gensim/lda $20topic-word$20matrix/gensim/Qoj7Agkx3qE/r9lyfihC4b4J src = https://groups.google.com/forum/?fromgroups=#!searchin/gensim/lda $ 20topic-word $ 20matrix / gensim / Qoj7Agkx3qE / r9lyfihC4b4J

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM