StandardAnalyzer
consider space-character as a token, I want StandardAnalyzer
to not to make tokens using space-character as a token. So how can I override the tokenizer of StandardAnalyzer
. If NOT the please suggest any other Analyzer
with example that does not use the space-character as a token.
This code can helpy ou :
Analyzer ana = new StandardAnalyzer(LUCENE_30, Collections.emptySet());
Note that, the answer is version-dependent. For Lucene 4.0, use:
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_40, CharArraySet.EMPTY_SET);
Edit :
Constructs a StandardTokenizer filtered by a StandardFilter, a org.apache.lucene.analysis.LowerCaseFilter and a org.apache.lucene.analysis.StopFilter.
@Override
public TokenStream tokenStream(String fieldName, Reader reader) {
StandardTokenizer tokenStream = new StandardTokenizer(matchVersion, reader);
tokenStream.setMaxTokenLength(maxTokenLength);
TokenStream result = new StandardFilter(tokenStream);
result = new LowerCaseFilter(result);
result = new StopFilter(enableStopPositionIncrements, result, stopSet);
return result;
}
private static final class SavedStreams {
StandardTokenizer tokenStream;
TokenStream filteredTokenStream;
}
Well I replace StandardAnalyzer
with KeywordAnalyzer
, so this will be use for indexing and searching ... Then in search method I add these lines
parser.setDefaultOperator(Operator.AND);
if(searchWord.contains(" ")){
searchWord= searchWordreplace(" ", "?");
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.