[英]Lucene Analyzer for Indexing and Searching
I have a field that I am indexing with Lucene like so: 我有一个字段,我正在使用Lucene索引,如下所示:
@Field(name="hungerState", index=Index.TOKENIZED, store=Store.YES)
public HungerState getHungerState() {
The possible values of this field are HUNGRY, SLIGHTLY_HUNGRY, and NOT_HUNGRY
该字段的可能值为
HUNGRY, SLIGHTLY_HUNGRY, and NOT_HUNGRY
When these values are indexed using the StandardAnalyzer
, the terms end up as hungry, slightly
since it tokenizes on punctuation and ignores the "not". 当使用
StandardAnalyzer
对这些值进行索引时,这些术语最终会变得hungry, slightly
因为它会在标点符号上标记并忽略“not”。
If I change the index to index=Index.UN_TOKENIZED
, the indexed terms are HUNGRY, SLIGHTLY_HUNGRY, and NOT_HUNGRY
, as expected. 如果我将索引更改为
index=Index.UN_TOKENIZED
,则索引条件为HUNGRY, SLIGHTLY_HUNGRY, and NOT_HUNGRY
,如预期的那样。
My search API has 1 "search" method that constructs the Query
like so: 我的搜索API有1个“搜索”方法,构造
Query
如下所示:
MultiFieldQueryParser parser = new MultiFieldQueryParser(Version.LUCENE_30, getSearchFields(), new StandardAnalyzer(Version.LUCENE_30));
parser.setDefaultOperater(QueryParser.AND_OPERATOR);
Query query = parser.parse(searchTerms);
This handles searches where searchTerms = "foo", which searches all fields returned by getSearchFields()
on "foo", and also where searchTerms specifies fields and values to search (ie "hungerState:HUNGRY") 这将处理searchTerms =“foo”的搜索,搜索“foo”上的
getSearchFields()
返回的所有字段, getSearchFields()
指定要搜索的字段和值的搜索(即“hungerState:HUNGRY”)
My problem is with the latter scenario . 我的问题是后一种情况 。 Since the query parser is using a StandardAnalyzer, searches for
hungerState:SLIGHTLY_HUNGRY
get parsed into hungerState:"slightly hungry"
and searches for hungerState=NOT_HUNGRY
get parsed into hungerState=hungry
. 由于查询解析器正在使用StandardAnalyzer,因此搜索
hungerState:SLIGHTLY_HUNGRY
会被解析为hungerState:"slightly hungry"
并搜索hungerState=NOT_HUNGRY
会被解析为hungerState=hungry
。
When the field is indexed using the StandardAnalyzer, I get unexpected results (searches for HUNGRY and NOT_HUNGRY return results for all 3 values). 当使用StandardAnalyzer对字段进行索引时,我得到意外的结果(搜索HUNGRY和NOT_HUNGRY会返回所有3个值的结果)。 When the field is indexed as UN_TOKENIZED, I don't get any results since the query parser tokenizes the search string and makes it lowercase.
当字段被索引为UN_TOKENIZED时,我没有得到任何结果,因为查询解析器将搜索字符串标记化并使其为小写。
I've even tried specifying an Analyzer for indexing like KeywordAnalyzer
, but it pretty much has no effect since the entire search string is analyzed with StandardAnalyzer
every time. 我甚至尝试过像
KeywordAnalyzer
那样指定一个Analyzer进行索引,但由于每次都使用StandardAnalyzer
分析整个搜索字符串,所以几乎没有任何效果。
Any advice would be appreciated. 任何意见,将不胜感激。 Thanks!
谢谢!
You're using a standard analyzer for your query parser, so yes your query will be analyzed with a standard analyzer. 您正在为查询解析器使用标准分析器,因此您的查询将使用标准分析器进行分析。 Just switch to using a keyword analyzer:
只需切换到使用关键字分析器:
MultiFieldQueryParser parser = new MultiFieldQueryParser(Version.LUCENE_30, getSearchFields(),
new KeywordAnalyzer(Version.LUCENE_30));
You may want to use a PerFieldAnalyzerWrapper if your other fields aren't keywords. 如果您的其他字段不是关键字,则可能需要使用PerFieldAnalyzerWrapper 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.