简体繁体中英

different approach for document similarity(LDA, LSA, cosine)

原文 2017-01-05 20:38:24 7 1 text/ similarity/ lda/ trigonometry/ lsa

I have set of short documents(1 or 2 paragraph each). I have used three different approaches for document similarity: - simple cosine similarity on tfidf matrix - applying LDA on the whole corpus and then using the LDA model to create the vector for each document then I applied cosine similarity. -applying LSA on the whole corpus and then using the LSA model to create the vector for each document then I applied cosine similarity.

Based on experiments I am getting better result on simple cosine similarty on tfidf matrix without any LDA or LSA. Based on what I read LDA or LSA should improve the result, but in my case it is not! Is there any idea why LDA or LSA have worse results? both LDA and LSA when trained for more than 1000 rounds find similarity between some documents with probability higher than 90% which are totally unrelated!

Is there any justification for that?

Thanks

1 answers

I have used LDA4j implementation and got better results than TFIDF, and similarly for LSI i have used semantic-vector implementation. If you have your own implementation share the model sketch. One more thing you should need to normalize the corpus for better results.

How do I find Coherence Score for LSA and LDA for SkLearn Models?

Computing cosine similarity using Python

Cosine Similarity practical use cases

R: Correct Way to Calculate Cosine Similarity?

Systematic threshold for cosine similarity with TF-IDF weights

Best way to find document similarity

How to compute pairwise cosine similarity matrix between words in one text file

How to measure Syntactic Similarity between a query and a document?

Doubts regarding LSA

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Cosine similarity of a new text document with existing list of documents How do I find Coherence Score for LSA and LDA for SkLearn Models? Computing cosine similarity using Python Cosine Similarity practical use cases R: Correct Way to Calculate Cosine Similarity? Systematic threshold for cosine similarity with TF-IDF weights Best way to find document similarity How to compute pairwise cosine similarity matrix between words in one text file How to measure Syntactic Similarity between a query and a document? Doubts regarding LSA

Related Tags

different approach for document similarity(LDA, LSA, cosine)

Question

1 answers

solution1 0 2017-12-03 09:45:01

solution1
0 2017-12-03 09:45:01