简体   繁体   English

两个句子之间的软余弦相似度

[英]Soft Cosine Similarity between two sentences

I am trying to find a simple way to calculate soft cosine similarity between two sentences.我试图找到一种简单的方法来计算两个句子之间的软余弦相似度。

Here is my attempt and learning:这是我的尝试和学习:

from gensim.matutils import softcossim

sent_1 = 'Dravid is a cricket player and a opening batsman'.split()
sent_2 = 'Leo is a cricket player too He is a batsman,baller and keeper'.split()

print(softcossim(sent_1, sent_2, similarity_matrix))

I'm unable to understand about similarity_matrix .我无法理解similarity_matrix Please help me find so, and henceforth the soft cosine similarity in python.请帮我找到这个,以及python中的软余弦相似度。

As of the current version of Gensim, 3.8.3, some of the method calls from both the question and previous answers have been deprecated.从 Gensim 的当前版本 3.8.3 开始,来自问题和先前答案的一些方法调用已被弃用。 Those functions deprecated have been removed from the 4.0.0 beta.这些不推荐使用的功能已从 4.0.0 测试版中删除。 Can't seem to provide code in a reply to @EliadL, so adding a new comment.似乎无法在对@EliadL 的回复中提供代码,因此添加了一条新评论。

The current method for solving this problem in Gensim 3.8.3 and 4.0.0 is as follows:目前在Gensim 3.8.3和4.0.0中解决这个问题的方法如下:

import gensim.downloader as api
from gensim import corpora
from gensim.matutils import softcossim

sent_1 = 'Dravid is a cricket player and a opening batsman'.split()
sent_2 = 'Leo is a cricket player too He is a batsman,baller and keeper'.split()

# Download the FastText model
fasttext_model300 = api.load('fasttext-wiki-news-subwords-300')

# Prepare a dictionary and a corpus.
documents = [sent_1, sent_2]
dictionary = corpora.Dictionary(documents)

# Prepare the similarity matrix
similarity_index = WordEmbeddingSimilarityIndex(fasttext_model300)
similarity_matrix = SparseTermSimilarityMatrix(similarity_index, dictionary)

# Convert the sentences into bag-of-words vectors.
sent_1 = dictionary.doc2bow(sent_1)
sent_2 = dictionary.doc2bow(sent_2)

# Compute soft cosine similarity
print(similarity_matrix.inner_product(sent_1, sent_2, normalized=True))
#> 0.68463486

For users of Gensim v. 3.8.3, I've also found this Notebook to be helpful in understanding Soft Cosine Similarity and how to apply Soft Cosine Similarity using Gensim.对于 Gensim v. 3.8.3 的用户,我还发现这本笔记本有助于理解软余弦相似度以及如何使用 Gensim 应用软余弦相似度。

As of now, for users of Gensim 4.0.0 beta this Notebook is the one to look at.到目前为止,对于 Gensim 4.0.0 测试版的用户来说,这款Notebook是值得一看的。

Going by this tutorial :通过本教程

import gensim.downloader as api
from gensim import corpora
from gensim.matutils import softcossim

sent_1 = 'Dravid is a cricket player and a opening batsman'.split()
sent_2 = 'Leo is a cricket player too He is a batsman,baller and keeper'.split()

# Download the FastText model
fasttext_model300 = api.load('fasttext-wiki-news-subwords-300')

# Prepare a dictionary and a corpus.
documents = [sent_1, sent_2]
dictionary = corpora.Dictionary(documents)

# Prepare the similarity matrix
similarity_matrix = fasttext_model300.similarity_matrix(dictionary)

# Convert the sentences into bag-of-words vectors.
sent_1 = dictionary.doc2bow(sent_1)
sent_2 = dictionary.doc2bow(sent_2)

# Compute soft cosine similarity
print(softcossim(sent_1, sent_2, similarity_matrix))
#> 0.7909639717134869

You can use SoftCosineSimilarity class in gensim.similarities in gensim 4.0.0 upwards您可以在 gensim 4.0.0 以上的 gensim.similarities 中使用 SoftCosineSimilarity 类

from gensim.similarities import SoftCosineSimilarity
#Calculate Soft Cosine Similarity between the query and the documents.
def find_similarity(query,documents):
  query = dictionary.doc2bow(query)
  index = SoftCosineSimilarity(
    [dictionary.doc2bow(document) for document in documents],
    similarity_matrix)
  similarities = index[query]
  return similarities

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM