简体   繁体   中英

Latent Semantic Analysis and Stemming

Assume a very large corpus of any inflective language. Does the following make sense? By applying LSA on such corpus, words with similar concepts converge together in vector space, thus inflected word forms reffering to the same concept should ideally be identical with their lemma in the space. With such assumption, any lemmatization or stemming of queries or corpus is not necessary. Or am i totally wrong?

According to the founders of LSA, stemming is not necessary . Though, I think there is general disagreement in the literature about this. I have read a few papers where stemming was found to improve results for a given information retrieval task.

Generally, there is recent research that shows stemming does not help in topic modeling and may even hurt topic coherence.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM