I need a Lucene Tokenizer that can do the following. Given the string "wines bottle caps", the following queries should succeed
Here is what I have so far. How might I modify it to work? No query less than three characters should work.
public class PorterAnalyzer extends Analyzer {
private final Version version;
public PorterAnalyzer(Version version) {
this.version = version;
}
@Override
@SuppressWarnings("resource")
protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
final StandardTokenizer src = new StandardTokenizer(reader);
TokenStream tok = new StandardFilter(src);
tok = new LowerCaseFilter( tok);
tok = new StopFilter( tok, StandardAnalyzer.STOP_WORDS_SET);
tok = new PorterStemFilter(tok);
return new TokenStreamComponents(src, tok);
}
}
I think you are searching for NGramTokenFilter .
Try, for example:
tok=new NGramTokenFilter(tok,2,5);
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.