简体繁体中英

evaluating Doc2Vec - cosine similarity matrix

原文 2021-02-09 19:06:20 3 1 python/ nlp/ gensim/ doc2vec

I'm training my Doc2Vec model on 106k documents (100-600 words per document). The goal is to retrieve similar documents for a target document.

Since Doc2Vec is an unsupervised model there is no real evaluation possible except to test how it performs on your downstream task. So, I created a small dataset containing about 200 target documents and 5 similar documents per target.

My idea is to calculate the cosine similarity for every document against all other documents in my test dataset and get top 5 similar documents per target document.

Is there an efficient way to create a cosine similarity matrix with Doc2Vec? The most_similar function is impractical as it retrieves every similar document used for training.

1 answers

You could use sklearn 's cosine_similarity function for this. Once you have the list of 200 vectors, you can just convert to numpy array and pass it through this function. It will give you pairwise similarity matrix. Later you can use argsort() function to get the indices of the documents that are closest. For top-k matching, you could use arr.argsort()[-k:][::-1] .

doc2vec inaccurate cosine similarity

Similarity with Doc2Vec

Can I obtain Word2Vec and Doc2Vec matrices to calculate a cosine similarity?

Doc2vec matrix representation

Doc2Vec similarity small corpus test

Doc2Vec - Finding document similarity in test data

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question doc2vec inaccurate cosine similarity Similarity with Doc2Vec Can I obtain Word2Vec and Doc2Vec matrices to calculate a cosine similarity? cosine similarity doc vectors and word vectors for topical prevalence using doc2vec Doc2vec matrix representation Doc2Vec similarity small corpus test Find similarity with doc2vec like word2vec Doc2Vec - Finding document similarity in test data inconsistent similarity betwen inferred and trained vectors in doc2vec Finding similarity of 1 paragraph in different documents with Doc2vec

Related Tags

evaluating Doc2Vec - cosine similarity matrix

Question

1 answers

solution1 0 2021-02-10 13:25:46

solution1
0 2021-02-10 13:25:46