[英]Classify Text with Gensim LDA Model
For reference, I already looked at the following questions:作为参考,我已经看过以下问题:
I am looking to have my LDA model trained from Gensim classify a sentence under one of the topics that the model creates.我希望让我的 LDA model 从 Gensim 接受培训,根据 model 创建的主题之一对句子进行分类。 Something long the lines of
长长的线条
lda = models.LdaModel(corpus=corpus, id2word=id2word, num_topics=7, passes=20)
lda.print_topics()
for line in document: # where each line in the document is its own sentence for simplicity
print('Sentence: ', line)
topic = lda.parse(line) # where the classification would occur
print('Topic: ', topic)
I know gensim does not have a parse
function, but how would one go about accomplishing this?我知道 gensim 没有
parse
function,但是一个 go 如何完成这个呢? Here is the documentation that I've been following but I haven't gotten anywhere with it:这是我一直在关注的文档,但我没有得到任何帮助:
https://radimrehurek.com/gensim/auto_examples/core/run_topics_and_transformations.html#sphx-glr-auto-examples-core-run-topics-and-transformations-py https://radimrehurek.com/gensim/auto_examples/core/run_topics_and_transformations.html#sphx-glr-auto-examples-core-run-topics-and-transformations-py
Thanks in advance.提前致谢。
edit: More documentation- https://radimrehurek.com/gensim/models/ldamodel.html编辑:更多文档- https://radimrehurek.com/gensim/models/ldamodel.html
Let me get your problem right: You want to train a LDA Model on some documents an retrieve 7 topics.让我解决你的问题:你想在一些文档上训练 LDA Model 并检索 7 个主题。 Then you want to classify new documents in one (or more?) of these topics, meaning you want to infer topic distributions on new, unseen documents.
然后你想在这些主题中的一个(或多个?)中对新文档进行分类,这意味着你想推断新的、未见过的文档的主题分布。
If so, the gensim documentation provides answers.如果是这样,gensim 文档提供了答案。
lda = models.LdaModel(corpus=corpus, id2word=id2word, num_topics=7, passes=20)
lda.print_topics()
count = 1
for line in document: # where each line in the document is its own sentence for simplicity
print('\nSentence: ', line)
line = line.split()
line_bow = id2word.doc2bow(line)
doc_lda = lda[line_bow]
print('\nLine ' + str(count) + ' assigned to Topic ' + str(max(doc_lda)[0]) + ' with ' + str(round(max(doc_lda)[1]*100,2)) + ' probability!')
count += 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.