简体繁体中英

lucene stemmer strategy (does it keep both stemmed & non-stemmed words or just stemmed ones)

原文 2013-06-20 17:23:25 9 1 lucene/ stemming

I have a question regarding lucene Stemmer. I was wondering if lucene keeps both stemmed words and non-stemmed words OR just replaces the stemmed word with the non-stemmed words?

for example if a record has following: "everyone loves cats" does it going to be indexed as "everyone loves love cats cat" OR "everyone love cat"

Does it have a same strategy for both query and records?

1 answers

Generally, only the Stemmed version is kept. That is, in your example, the end result will be "everyone loves cat" rather than "everyone loves cat cats" or some similar combination.

You are expected to use the same stemmer both when indexing and querying. There may be some stemming filters that, like SynonymFilter , allow you to keep the original, but doing this and running unstemmed queries will tend to cause PhraseQueries not to work correctly (see the note in the SynonymFilter docs on this very topic). I don't believe most common stemming filters (ie. PorterStemFilter ) provide that functionality.

I you need to be able to search unstemmed data for some reason, I would recommend storing a second field that is entirely unstemmed for that purpose.

Get stemmed word in Lucene

Transitioning from Stemmed to unstemmed field in solr

In Solr, why is 'built' not being stemmed to 'build' but 'building' is?

Solr how can I have the original term first than the stemmed version?

Lucene Porter Stemmer not public

lucene query syntax for having both the specified words

Lucene Stemmer packages download

Lucene 3.6.0 - SnowballAnalyzer Stemmer Deprecated

Create a lucene romanian stemmer in java netbeans

Lucene Porter Stemmer - get original unstemmed word

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Get stemmed word in Lucene Transitioning from Stemmed to unstemmed field in solr In Solr, why is 'built' not being stemmed to 'build' but 'building' is? Solr how can I have the original term first than the stemmed version? Lucene Porter Stemmer not public lucene query syntax for having both the specified words Lucene Stemmer packages download Lucene 3.6.0 - SnowballAnalyzer Stemmer Deprecated Create a lucene romanian stemmer in java netbeans Lucene Porter Stemmer - get original unstemmed word

Related Tags

lucene stemmer strategy (does it keep both stemmed & non-stemmed words or just stemmed ones)

Question

1 answers

solution1 0 2013-06-20 20:24:47

solution1
0 2013-06-20 20:24:47