简体   繁体   English

用于索引和搜索的Lucene Analyzer

[英]Lucene Analyzer for Indexing and Searching

I have a field that I am indexing with Lucene like so: 我有一个字段,我正在使用Lucene索引,如下所示:

@Field(name="hungerState", index=Index.TOKENIZED, store=Store.YES)
public HungerState getHungerState() {

The possible values of this field are HUNGRY, SLIGHTLY_HUNGRY, and NOT_HUNGRY 该字段的可能值为HUNGRY, SLIGHTLY_HUNGRY, and NOT_HUNGRY

When these values are indexed using the StandardAnalyzer , the terms end up as hungry, slightly since it tokenizes on punctuation and ignores the "not". 当使用StandardAnalyzer对这些值进行索引时,这些术语最终会变得hungry, slightly因为它会在标点符号上标记并忽略“not”。

If I change the index to index=Index.UN_TOKENIZED , the indexed terms are HUNGRY, SLIGHTLY_HUNGRY, and NOT_HUNGRY , as expected. 如果我将索引更改为index=Index.UN_TOKENIZED ,则索引条件为HUNGRY, SLIGHTLY_HUNGRY, and NOT_HUNGRY ,如预期的那样。

My search API has 1 "search" method that constructs the Query like so: 我的搜索API有1个“搜索”方法,构造Query如下所示:

MultiFieldQueryParser parser = new MultiFieldQueryParser(Version.LUCENE_30, getSearchFields(), new StandardAnalyzer(Version.LUCENE_30));
parser.setDefaultOperater(QueryParser.AND_OPERATOR);
Query query = parser.parse(searchTerms);

This handles searches where searchTerms = "foo", which searches all fields returned by getSearchFields() on "foo", and also where searchTerms specifies fields and values to search (ie "hungerState:HUNGRY") 这将处理searchTerms =“foo”的搜索,搜索“foo”上的getSearchFields()返回的所有字段, getSearchFields()指定要搜索的字段和值的搜索(即“hungerState:HUNGRY”)

My problem is with the latter scenario . 我的问题是后一种情况 Since the query parser is using a StandardAnalyzer, searches for hungerState:SLIGHTLY_HUNGRY get parsed into hungerState:"slightly hungry" and searches for hungerState=NOT_HUNGRY get parsed into hungerState=hungry . 由于查询解析器正在使用StandardAnalyzer,因此搜索hungerState:SLIGHTLY_HUNGRY会被解析为hungerState:"slightly hungry"并搜索hungerState=NOT_HUNGRY会被解析为hungerState=hungry

When the field is indexed using the StandardAnalyzer, I get unexpected results (searches for HUNGRY and NOT_HUNGRY return results for all 3 values). 当使用StandardAnalyzer对字段进行索引时,我得到意外的结果(搜索HUNGRY和NOT_HUNGRY会返回所有3个值的结果)。 When the field is indexed as UN_TOKENIZED, I don't get any results since the query parser tokenizes the search string and makes it lowercase. 当字段被索引为UN_TOKENIZED时,我没有得到任何结果,因为查询解析器将搜索字符串标记化并使其为小写。

I've even tried specifying an Analyzer for indexing like KeywordAnalyzer , but it pretty much has no effect since the entire search string is analyzed with StandardAnalyzer every time. 我甚至尝试过像KeywordAnalyzer那样指定一个Analyzer进行索引,但由于每次都使用StandardAnalyzer分析整个搜索字符串,所以几乎没有任何效果。

Any advice would be appreciated. 任何意见,将不胜感激。 Thanks! 谢谢!

You're using a standard analyzer for your query parser, so yes your query will be analyzed with a standard analyzer. 您正在为查询解析器使用标准分析器,因此您的查询将使用标准分析器进行分析。 Just switch to using a keyword analyzer: 只需切换到使用关键字分析器:

MultiFieldQueryParser parser = new MultiFieldQueryParser(Version.LUCENE_30, getSearchFields(), 
          new KeywordAnalyzer(Version.LUCENE_30));

You may want to use a PerFieldAnalyzerWrapper if your other fields aren't keywords. 如果您的其他字段不是关键字,则可能需要使用PerFieldAnalyzerWrapper

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM