简体   繁体   中英

How to search name having keyword “With” using Lucene/Hibernate?

The name of person to search is "Suleman Kumar With" where With is last name. It works fine for all other names but not for this english keyword

Following is way i am creating Lucene indexes:

@Fields({ @Field(index = Index.YES, store = Store.NO),
@Field(name = "LastName_Sort", index = Index.YES, analyzer = @Analyzer(definition = "sortAnalyzer")) })
@Column(name = "LASTNAME", length = 50)
public String getLastName() {
  return lastName;
 }

sortAnalyzer have following configuration:

@AnalyzerDef(name = "sortAnalyzer",
  tokenizer = @TokenizerDef(factory = KeywordTokenizerFactory.class),
filters = {
    @TokenFilterDef(factory = LowerCaseFilterFactory.class),
    @TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
        @Parameter(name = "pattern", value = "('-&\\.,\\(\\))"),
        @Parameter(name = "replacement", value = " "),
        @Parameter(name = "replace", value = "all")
    }),
    @TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
        @Parameter(name = "pattern", value = "([^0-9\\p{L} ])"),
        @Parameter(name = "replacement", value = ""),
        @Parameter(name = "replace", value = "all")
    })
}
)

There is search on Last Name as well as Primary Key: ID, where i am getting Tokens not matched error.

I have achieved it using my own "Custom Analyzer".

public class IgnoreStopWordsAnalyzer extends StopwordAnalyzerBase {

    public IgnoreStopWordsAnalyzer() {
        super(Version.LUCENE_36, null);
    }

    @Override
    protected ReusableAnalyzerBase.TokenStreamComponents createComponents(final String fieldName, final Reader reader) {
        final StandardTokenizer src = new StandardTokenizer(Version.LUCENE_36, reader);
        TokenStream tok = new StandardFilter(Version.LUCENE_36, src);
        tok = new LowerCaseFilter(Version.LUCENE_36, tok);
        tok = new StopFilter(Version.LUCENE_36, tok, this.stopwords);
        return new ReusableAnalyzerBase.TokenStreamComponents(src, tok);
    }
}

Call this analyzer in Field and Stopwords will be ignored.

For hibernate search version 5 you can use such custom analyzer:

import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.Tokenizer;
import org.apache.lucene.analysis.core.LowerCaseFilter;
import org.apache.lucene.analysis.core.StopFilter;
import org.apache.lucene.analysis.standard.StandardFilter;
import org.apache.lucene.analysis.standard.StandardTokenizer;
import org.apache.lucene.analysis.util.StopwordAnalyzerBase;

public class IgnoreStopWordsAnalyzer extends StopwordAnalyzerBase {

    public IgnoreStopWordsAnalyzer() {
        super(null);
    }

    @Override
    protected TokenStreamComponents createComponents(String fieldName) {
        final Tokenizer source = new StandardTokenizer();
        TokenStream tokenStream = new StandardFilter(source);
        tokenStream = new LowerCaseFilter(tokenStream);
        tokenStream = new StopFilter(tokenStream, this.stopwords);
        return new TokenStreamComponents(source, tokenStream);
    }

}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM