如何在gensim中获取给定主题的文档向量

Question

I have about 9000 documents and I am using Gensim's doc2vec to embed my documents. 我大约有9000个文档，并且正在使用Gensim的doc2vec嵌入我的文档。 My code is as follows: 我的代码如下：

from gensim.models import doc2vec
from collections import namedtuple

dataset = json.load(open(input_file))

docs = []
analyzedDocument = namedtuple('AnalyzedDocument', 'words tags')

for description in dataset:
    tags = [description[0]]
    words = description[1]
    docs.append(analyzedDocument(words, tags))

model = doc2vec.Doc2Vec(docs, vector_size = 100, window = 10, min_count = 1, workers = 4, epochs = 20)

I would like to get all the documents related to topic "deep learning". 我想获取与“深度学习”主题相关的所有文档。 ie the documents that mainly have content related to deep learning. 即主要具有与深度学习有关的内容的文档。 Is it possible to do this in doc2vec model in gensim? 可以在gensim的doc2vec模型中执行此操作吗？

I am happy to provide more details if needed. 如果需要，我很乐意提供更多详细信息。

Answer 1

If there was a document in your training set that was a great example of "deep learning" – say, docs[17] – then after successful training you could ask for documents similar to that example document, and that could be roughly what you'd need. 如果您的训练集中有一个文档是“深度学习”的一个很好的例子–例如docs[17] –那么在成功训练之后，您可以索要与该示例文件相似的文件，而这大概就是您所需要的。 d需要。 For example: 例如：

sims = model.docvecs.most_similar(docs[17].tags[0])

You'd then have in sims a ranked, scored list of the 10 most-similar documents to the tag for the target document. 在sims中，您将获得与目标文档tag最相似的10个文档的排名，打分列表。

如何在gensim中获取给定主题的文档向量

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-07-20 18:56:36

如何在gensim中获取给定主题的文档向量

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-07-20 18:56:36

解决方案1
1 已采纳 2019-07-20 18:56:36