简体   繁体   中英

how to make lucene be case-insensitive

By default word "Word" and "word" are not the same. How can I make Lucene be case-insensitive?

The easiest approach is lowercasing all searchable content, as well as the queries. See the LowerCaseFilter documentation. You could also use Wildcard queries for case insensitive search since it bypasses the Analyzer .

You can store content in different fields to capture different case configurations if preferred.

The StandardAnalyzer applies a LowerCaseFilter that would make "Word" and "word" the same. You could simply pass that to your uses of IndexWriter and QueryParser . Eg a few line snippets:

Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30);
IndexWriter writer = new IndexWriter(dir, analyzer, true, mlf);
QueryParser parser = new QueryParser(Version.LUCENE_30, field, analyzer);

除了使用StandardAnalyzer (包括LowerCaseFilter和常见英语单词(例如“ the”)的过滤器)之外,还应确保使用TextField而非用于精确搜索的StringField来构建文档。

Add LowerCaseFilterFactory to your fieldType for that field in Schema.xml. Example,

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
            <analyzer type="index">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1" />
                <filter class="solr.LowerCaseFilterFactory"/>
            </analyzer>

            <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
                <filter class="solr.LowerCaseFilterFactory"/>
            </analyzer>
        </fieldType>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM