简体   繁体   中英

Proper Analyzer to Use in Lucene for 'case-insensitive , contains' matching

I am using Lucene to create an index of search items on a java servlet.

The user enters text on a webpage and an ajax request is made to the servlet to get any strings that match the query string. The results are used to populate an autocomplete menu on the webpage.

Currently Lucene code only sends back matches if the user enters a whole word. I want it to return results even if only 1 letter matches an item in the index. In other words, how do I get the Lucene code to match the whole input string, regardless of how small the input string is? Do I need to change the Analyzer being used? I am using standard analyzer:

StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_47);

Matching on single letters in common defeats the purpose of an inverted text engine, and none of the standard analyzers will do that. If you insist, you can use the http://lucene.apache.org/core/4_8_0/analyzers-common/org/apache/lucene/analysis/ngram/NGramTokenizer.html with min and max set to 1. You will need to build your own analyzer object, but that's a good idea anyway.

Based on clarification from a comment that the OP wishes to match across whitespace boundaries:

This is not a job for an inverted index. An inverted index works by indexing all the strings that can match . To match an input against all arbitrary-length substrings would require a gigantic index, and would be too slow. You need something else entirely.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM