简体   繁体   中英

Lucene Analyzer for Indexing and Searching

I have a field that I am indexing with Lucene like so:

@Field(name="hungerState", index=Index.TOKENIZED, store=Store.YES)
public HungerState getHungerState() {

The possible values of this field are HUNGRY, SLIGHTLY_HUNGRY, and NOT_HUNGRY

When these values are indexed using the StandardAnalyzer , the terms end up as hungry, slightly since it tokenizes on punctuation and ignores the "not".

If I change the index to index=Index.UN_TOKENIZED , the indexed terms are HUNGRY, SLIGHTLY_HUNGRY, and NOT_HUNGRY , as expected.

My search API has 1 "search" method that constructs the Query like so:

MultiFieldQueryParser parser = new MultiFieldQueryParser(Version.LUCENE_30, getSearchFields(), new StandardAnalyzer(Version.LUCENE_30));
parser.setDefaultOperater(QueryParser.AND_OPERATOR);
Query query = parser.parse(searchTerms);

This handles searches where searchTerms = "foo", which searches all fields returned by getSearchFields() on "foo", and also where searchTerms specifies fields and values to search (ie "hungerState:HUNGRY")

My problem is with the latter scenario . Since the query parser is using a StandardAnalyzer, searches for hungerState:SLIGHTLY_HUNGRY get parsed into hungerState:"slightly hungry" and searches for hungerState=NOT_HUNGRY get parsed into hungerState=hungry .

When the field is indexed using the StandardAnalyzer, I get unexpected results (searches for HUNGRY and NOT_HUNGRY return results for all 3 values). When the field is indexed as UN_TOKENIZED, I don't get any results since the query parser tokenizes the search string and makes it lowercase.

I've even tried specifying an Analyzer for indexing like KeywordAnalyzer , but it pretty much has no effect since the entire search string is analyzed with StandardAnalyzer every time.

Any advice would be appreciated. Thanks!

You're using a standard analyzer for your query parser, so yes your query will be analyzed with a standard analyzer. Just switch to using a keyword analyzer:

MultiFieldQueryParser parser = new MultiFieldQueryParser(Version.LUCENE_30, getSearchFields(), 
          new KeywordAnalyzer(Version.LUCENE_30));

You may want to use a PerFieldAnalyzerWrapper if your other fields aren't keywords.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM