简体   繁体   中英

How to add special characters in lucene search ? c#

I am using Standard analyzer from lucene in my search engine to search for german words this is my code:

private IList<Document> GetFromLucene(string terme, string FieldName)
    {
        TopDocs hits;
        CustomAnalyzer standardAnalyzer = new CustomAnalyzer(Lucene.Net.Util.Version.LUCENE_29);
        List<Document> matches = new List<Document>();
        IndexSearcher indexSearcher = new IndexSearcher(FSDirectory.Open(new System.IO.DirectoryInfo(MainDoc + DocIndex)), true);

        if (terme.Contains(" "))
        {
            BooleanQuery finalQuery = new BooleanQuery();
            string[] terms = terme.Split(' ');

            #region AND
            QueryParser queryParser = new QueryParser(Lucene.Net.Util.Version.LUCENE_29, FieldName, standardAnalyzer)
            {
                DefaultOperator = QueryParser.Operator.AND
            };
            #endregion

            #region Contains
            Query querys = queryParser.Parse("" + terme + "*");
            finalQuery.Add(querys, Occur.MUST);
            #endregion

            hits = indexSearcher.Search(finalQuery, int.MaxValue);
        }
        else
        {
            WildcardQuery query;
            query = new WildcardQuery(new Term(FieldName, "*" + terme + "*"));
            hits = indexSearcher.Search(query, int.MaxValue);
        }


        matches = hits.ScoreDocs.Select(scoreDoc => indexSearcher.Doc(scoreDoc.Doc)).ToList();

        return matches;
    }

it doesn't appear to find words containing "ü" and "ä". How can i achieve this ?

Lucene uses the so-called analyzer classes to examine indexed terms from text and generate a token stream. To implement an accent-insensitive search, you replace the default analyzer used by Lucene with one that replaces accented characters with the corresponding unaccented ones. Sitefinity CMS has an example: https://www.progress.com/documentation/sitefinity-cms/for-developers-search-with-accented-characters

The default analyzer skips the special characters use the exact match query which will consider the special characters that you use. https://lucenenet.apache.org/docs/3.0.3/d5/d58/class_lucene_1_1_net_1_1_search_1_1_phrase_query.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM