简体   繁体   中英

Remove Space Character from Lucene Standard Analyzer

StandardAnalyzer consider space-character as a token, I want StandardAnalyzer to not to make tokens using space-character as a token. So how can I override the tokenizer of StandardAnalyzer . If NOT the please suggest any other Analyzer with example that does not use the space-character as a token.

This code can helpy ou :

Analyzer ana = new StandardAnalyzer(LUCENE_30, Collections.emptySet());

Note that, the answer is version-dependent. For Lucene 4.0, use:

Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_40, CharArraySet.EMPTY_SET);

Edit :

Constructs a StandardTokenizer filtered by a StandardFilter, a org.apache.lucene.analysis.LowerCaseFilter and a org.apache.lucene.analysis.StopFilter.

@Override
public TokenStream tokenStream(String fieldName, Reader reader) {
   StandardTokenizer tokenStream = new StandardTokenizer(matchVersion, reader);
    tokenStream.setMaxTokenLength(maxTokenLength);
    TokenStream result = new StandardFilter(tokenStream);
    result = new LowerCaseFilter(result);
    result = new StopFilter(enableStopPositionIncrements, result, stopSet);
    return result;
}

private static final class  SavedStreams {
        StandardTokenizer tokenStream;
        TokenStream filteredTokenStream;
}

Well I replace StandardAnalyzer with KeywordAnalyzer , so this will be use for indexing and searching ... Then in search method I add these lines

parser.setDefaultOperator(Operator.AND);
if(searchWord.contains(" ")){
    searchWord= searchWordreplace(" ", "?");
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM