用于索引和搜索的Lucene Analyzer

Question

I have a field that I am indexing with Lucene like so: 我有一个字段，我正在使用Lucene索引，如下所示：

@Field(name="hungerState", index=Index.TOKENIZED, store=Store.YES)
public HungerState getHungerState() {

The possible values of this field are HUNGRY, SLIGHTLY_HUNGRY, and NOT_HUNGRY 该字段的可能值为HUNGRY, SLIGHTLY_HUNGRY, and NOT_HUNGRY

When these values are indexed using the StandardAnalyzer , the terms end up as hungry, slightly since it tokenizes on punctuation and ignores the "not". 当使用StandardAnalyzer对这些值进行索引时，这些术语最终会变得hungry, slightly因为它会在标点符号上标记并忽略“not”。

If I change the index to index=Index.UN_TOKENIZED , the indexed terms are HUNGRY, SLIGHTLY_HUNGRY, and NOT_HUNGRY , as expected. 如果我将索引更改为index=Index.UN_TOKENIZED ，则索引条件为HUNGRY, SLIGHTLY_HUNGRY, and NOT_HUNGRY ，如预期的那样。

My search API has 1 "search" method that constructs the Query like so: 我的搜索API有1个“搜索”方法，构造Query如下所示：

MultiFieldQueryParser parser = new MultiFieldQueryParser(Version.LUCENE_30, getSearchFields(), new StandardAnalyzer(Version.LUCENE_30));
parser.setDefaultOperater(QueryParser.AND_OPERATOR);
Query query = parser.parse(searchTerms);

This handles searches where searchTerms = "foo", which searches all fields returned by getSearchFields() on "foo", and also where searchTerms specifies fields and values to search (ie "hungerState:HUNGRY") 这将处理searchTerms =“foo”的搜索，搜索“foo”上的getSearchFields()返回的所有字段， getSearchFields()指定要搜索的字段和值的搜索（即“hungerState：HUNGRY”）

My problem is with the latter scenario . 我的问题是后一种情况 。 Since the query parser is using a StandardAnalyzer, searches for hungerState:SLIGHTLY_HUNGRY get parsed into hungerState:"slightly hungry" and searches for hungerState=NOT_HUNGRY get parsed into hungerState=hungry . 由于查询解析器正在使用StandardAnalyzer，因此搜索hungerState:SLIGHTLY_HUNGRY会被解析为hungerState:"slightly hungry"并搜索hungerState=NOT_HUNGRY会被解析为hungerState=hungry 。

When the field is indexed using the StandardAnalyzer, I get unexpected results (searches for HUNGRY and NOT_HUNGRY return results for all 3 values). 当使用StandardAnalyzer对字段进行索引时，我得到意外的结果（搜索HUNGRY和NOT_HUNGRY会返回所有3个值的结果）。 When the field is indexed as UN_TOKENIZED, I don't get any results since the query parser tokenizes the search string and makes it lowercase. 当字段被索引为UN_TOKENIZED时，我没有得到任何结果，因为查询解析器将搜索字符串标记化并使其为小写。

I've even tried specifying an Analyzer for indexing like KeywordAnalyzer , but it pretty much has no effect since the entire search string is analyzed with StandardAnalyzer every time. 我甚至尝试过像KeywordAnalyzer那样指定一个Analyzer进行索引，但由于每次都使用StandardAnalyzer分析整个搜索字符串，所以几乎没有任何效果。

Any advice would be appreciated. 任何意见，将不胜感激。 Thanks! 谢谢！

Answer 1

You're using a standard analyzer for your query parser, so yes your query will be analyzed with a standard analyzer. 您正在为查询解析器使用标准分析器，因此您的查询将使用标准分析器进行分析。 Just switch to using a keyword analyzer: 只需切换到使用关键字分析器：

MultiFieldQueryParser parser = new MultiFieldQueryParser(Version.LUCENE_30, getSearchFields(), 
          new KeywordAnalyzer(Version.LUCENE_30));

You may want to use a PerFieldAnalyzerWrapper if your other fields aren't keywords. 如果您的其他字段不是关键字，则可能需要使用PerFieldAnalyzerWrapper 。

用于索引和搜索的Lucene Analyzer

问题描述

1 个解决方案

解决方案1
2 已采纳 2011-10-12 18:28:36

用于索引和搜索的Lucene Analyzer

问题描述

1 个解决方案

解决方案1 2 已采纳 2011-10-12 18:28:36

解决方案1
2 已采纳 2011-10-12 18:28:36