查找与word2vec之类的doc2vec的相似性

Question

Is there a way to find similar docs like we do in word2vec 有没有办法像在word2vec中一样找到类似的文档

Like: 喜欢：

  model2.most_similar(positive=['good','nice','best'],
    negative=['bad','poor'],
    topn=10)

I know we can use infer_vector,feed them to have similar ones, but I want to feed many positive and negative examples as we do in word2vec. 我知道我们可以使用infer_vector，让它们具有相似的值，但是我想像在word2vec中一样提供许多正面和负面的例子。

is there any way we can do that! 有什么办法可以做到！ thanks ! 谢谢！

Answer 1

The doc-vectors part of a Doc2Vec model works just like word-vectors, with respect to a most_similar() call. 对于most_similar()调用， Doc2Vec模型的doc-vector部分的工作原理与单词向量类似。 You can supply multiple doc-tags or full vectors inside both the positive and negative parameters. 您可以在positive参数和negative参数中提供多个doc标签或完整矢量。

So you could call... 所以你可以打电话给...

sims = d2v_model.docvecs.most_similar(positive=['doc001', 'doc009'], negative=['doc102'])

...and it should work. ...它应该工作。 The elements of the positive or negative lists could be doc-tags that were present during training, or raw vectors (like those returned by infer_vector() , or your own averages of multiple such vectors). positive列表或negative列表的元素可以是训练过程中出现的doc标签，也可以是原始向量（例如，由infer_vector()返回的infer_vector() ，或您自己的多个此类向量的平均值）。

Answer 2

Don't believe there is a pre-written function for this. 不要相信有一个预先编写的功能。

One approach would be to write a function that iterates through each word in the positive list to get top n words for a particular word. 一种方法是编写一个遍历肯定列表中每个单词的函数，以获取特定单词的前n个单词。

So for positive words in your question example, you would end up with 3 lists of 10 words. 因此，对于您的问题示例中的肯定单词，您最终将得到3个包含10个单词的列表。

You could then identify words that are common across the 3 lists as the top n similar to your positive list. 然后，您可以将3个列表中常见的单词标识为前n个，与肯定列表相似。 Since not all words will be common across the 3 lists, you probably need to get top 20 similar words when iterating so you end up with top 10 words as you want in your example. 由于并非所有单词在这三个列表中都是相同的，因此您可能需要在迭代时获得前20个相似的单词，因此在示例中最终需要获得前10个单词。

Then do the same for negative words. 然后对否定词做同样的事情。

查找与word2vec之类的doc2vec的相似性

问题描述

2 个解决方案

解决方案1
1 已采纳 2018-12-07 01:07:31

解决方案2
0 2018-12-05 15:24:35

查找与word2vec之类的doc2vec的相似性

问题描述

2 个解决方案

解决方案1 1 已采纳 2018-12-07 01:07:31

解决方案2 0 2018-12-05 15:24:35

解决方案1
1 已采纳 2018-12-07 01:07:31

解决方案2
0 2018-12-05 15:24:35