简体   繁体   中英

Lucene custom analyzer

I am making search job site using Lucene, and coped with such problem. I need to search C#, .net so i need to use WhiteSpaceAnalyzer, but if i use it search will be case sensetive.

How can i make this case insensative? Now I see only one solution is to make own Analyzer. But i am new in Lucene, can you please help me with some sample of code for this. I made something that i think must work but it is not. Look

public sealed class NewWhitespaceAnalyzer : Analyzer
    {
        public override TokenStream TokenStream(System.String fieldName, System.IO.TextReader reader)
        {
            return new LowerCaseFilter(new WhitespaceTokenizer(reader));
        }

        public override TokenStream ReusableTokenStream(System.String fieldName, System.IO.TextReader reader)
        {
            Tokenizer tokenizer = (Tokenizer)GetPreviousTokenStream();
            if (tokenizer == null)
            {
                tokenizer = new WhitespaceTokenizer(reader);
                SetPreviousTokenStream(tokenizer);
            }
            else
                tokenizer.Reset(reader);
            return tokenizer;
        }
    }

If you would see mistake here please correct me.

If you have any other suggestions, you are wlcome.

Thanks for any help, Dima.

Try this:

public sealed class NewWhitespaceAnalyzer : Analyzer
{
    public override TokenStream TokenStream(System.String fieldName, System.IO.TextReader reader)
    {
        return new LowerCaseFilter(new WhitespaceTokenizer(reader));
    }

    public override TokenStream ReusableTokenStream(System.String fieldName, System.IO.TextReader reader)
    {
        SavedStreams streams = (SavedStreams) GetPreviousTokenStream();
        if (streams == null)
        {
            streams = new SavedStreams();
            SetPreviousTokenStream(streams);
            streams.tokenStream = new WhiteSpaceTokenizer(reader);
            streams.filteredTokenStream = new LowerCaseFilter(streams.tokenStream);
        }
        else
        {
            streams.tokenStream.Reset(reader);
        }
        return streams.filteredTokenStream;
    }
}

There are 2 points:

  • use LowerCaseFilter also in the ReusableTokenStream method.

  • don't forget to use this custom Analyzer in both the query parsing and the document indexing.

enjoy.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM