[英]Find similarity with doc2vec like word2vec
Is there a way to find similar docs like we do in word2vec 有没有办法像在word2vec中一样找到类似的文档
Like: 喜欢:
model2.most_similar(positive=['good','nice','best'],
negative=['bad','poor'],
topn=10)
I know we can use infer_vector,feed them to have similar ones, but I want to feed many positive and negative examples as we do in word2vec. 我知道我们可以使用infer_vector,让它们具有相似的值,但是我想像在word2vec中一样提供许多正面和负面的例子。
is there any way we can do that! 有什么办法可以做到! thanks ! 谢谢 !
The doc-vectors part of a Doc2Vec
model works just like word-vectors, with respect to a most_similar()
call. 对于most_similar()
调用, Doc2Vec
模型的doc-vector部分的工作原理与单词向量类似。 You can supply multiple doc-tags or full vectors inside both the positive
and negative
parameters. 您可以在positive
参数和negative
参数中提供多个doc标签或完整矢量。
So you could call... 所以你可以打电话给...
sims = d2v_model.docvecs.most_similar(positive=['doc001', 'doc009'], negative=['doc102'])
...and it should work. ...它应该工作。 The elements of the positive
or negative
lists could be doc-tags that were present during training, or raw vectors (like those returned by infer_vector()
, or your own averages of multiple such vectors). positive
列表或negative
列表的元素可以是训练过程中出现的doc标签,也可以是原始向量(例如,由infer_vector()
返回的infer_vector()
,或您自己的多个此类向量的平均值)。
Don't believe there is a pre-written function for this. 不要相信有一个预先编写的功能。
One approach would be to write a function that iterates through each word in the positive list to get top n words for a particular word. 一种方法是编写一个遍历肯定列表中每个单词的函数,以获取特定单词的前n个单词。
So for positive words in your question example, you would end up with 3 lists of 10 words. 因此,对于您的问题示例中的肯定单词,您最终将得到3个包含10个单词的列表。
You could then identify words that are common across the 3 lists as the top n similar to your positive list. 然后,您可以将3个列表中常见的单词标识为前n个,与肯定列表相似。 Since not all words will be common across the 3 lists, you probably need to get top 20 similar words when iterating so you end up with top 10 words as you want in your example. 由于并非所有单词在这三个列表中都是相同的,因此您可能需要在迭代时获得前20个相似的单词,因此在示例中最终需要获得前10个单词。
Then do the same for negative words. 然后对否定词做同样的事情。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.