By default word "Word" and "word" are not the same. How can I make Lucene be case-insensitive?
The easiest approach is lowercasing all searchable content, as well as the queries. See the LowerCaseFilter
documentation. You could also use Wildcard
queries for case insensitive search since it bypasses the Analyzer
.
You can store content in different fields to capture different case configurations if preferred.
The StandardAnalyzer
applies a LowerCaseFilter
that would make "Word" and "word" the same. You could simply pass that to your uses of IndexWriter
and QueryParser
. Eg a few line snippets:
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30);
IndexWriter writer = new IndexWriter(dir, analyzer, true, mlf);
QueryParser parser = new QueryParser(Version.LUCENE_30, field, analyzer);
除了使用StandardAnalyzer
(包括LowerCaseFilter
和常见英语单词(例如“ the”)的过滤器)之外,还应确保使用TextField
而非用于精确搜索的StringField
来构建文档。
Add LowerCaseFilterFactory to your fieldType for that field in Schema.xml. Example,
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.