[英]Gensim LDA giving output of Topic IDs but probabilities are not adding up to 1
我已经训练了LDA模型来聚类100个主题,并且据我所知,每个主题都应具有一定的概率输出,总和为1。
但是,当我运行这段代码时,我只会得到2个主题。
请帮忙。
text = "A blood cell, also called a hematocyte, is a cell produced by hematopoiesis and normally found in blood."
# transform text into the bag-of-words space
bow_vector = dictionary.doc2bow(tokenize(text))
lda_vector = lda_model[bow_vector]
print("LDA Output: ", lda_vector)
print("\nTop Keywords from highest prob Topic: ",lda_model.print_topic(max(lda_vector, key=lambda item: item[1])[0]))
print("\n\nAddition of all the probabilities from LDA output:",functools.reduce(lambda x,y:x+y,[i[1] for i in lda_vector]))
LDA输出:[(64,0.6952628),(69,0.18223721)]
最高概率的热门关键字主题:0.042 *“健康” + 0.032 *“医疗” + 0.017 *“患者” + 0.016 *“癌症” + 0.015 *“医院” + 0.015 *“所述” + 0.015 *“治疗” + 0.012 *“医生” + 0.012 *“护理” + 0.012 *“药品”
LDA输出的所有概率之和:0.8775
如果将LdaModel
的参数minimum_probability
设置为0
,则总和将为1
(或由于近似误差而接近1
)。 它控制着过滤文档返回的主题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.