简体   繁体   English

如何在 lucene 搜索中添加特殊字符? C#

[英]How to add special characters in lucene search ? c#

I am using Standard analyzer from lucene in my search engine to search for german words this is my code:我在搜索引擎中使用来自 lucene 的标准分析器来搜索德语单词,这是我的代码:

private IList<Document> GetFromLucene(string terme, string FieldName)
    {
        TopDocs hits;
        CustomAnalyzer standardAnalyzer = new CustomAnalyzer(Lucene.Net.Util.Version.LUCENE_29);
        List<Document> matches = new List<Document>();
        IndexSearcher indexSearcher = new IndexSearcher(FSDirectory.Open(new System.IO.DirectoryInfo(MainDoc + DocIndex)), true);

        if (terme.Contains(" "))
        {
            BooleanQuery finalQuery = new BooleanQuery();
            string[] terms = terme.Split(' ');

            #region AND
            QueryParser queryParser = new QueryParser(Lucene.Net.Util.Version.LUCENE_29, FieldName, standardAnalyzer)
            {
                DefaultOperator = QueryParser.Operator.AND
            };
            #endregion

            #region Contains
            Query querys = queryParser.Parse("" + terme + "*");
            finalQuery.Add(querys, Occur.MUST);
            #endregion

            hits = indexSearcher.Search(finalQuery, int.MaxValue);
        }
        else
        {
            WildcardQuery query;
            query = new WildcardQuery(new Term(FieldName, "*" + terme + "*"));
            hits = indexSearcher.Search(query, int.MaxValue);
        }


        matches = hits.ScoreDocs.Select(scoreDoc => indexSearcher.Doc(scoreDoc.Doc)).ToList();

        return matches;
    }

it doesn't appear to find words containing "ü" and "ä".它似乎找不到包含“ü”和“ä”的单词。 How can i achieve this ?我怎样才能做到这一点?

Lucene uses the so-called analyzer classes to examine indexed terms from text and generate a token stream. Lucene 使用所谓的分析器类来检查文本中的索引词并生成标记流。 To implement an accent-insensitive search, you replace the default analyzer used by Lucene with one that replaces accented characters with the corresponding unaccented ones.要实现不区分重音的搜索,请将 Lucene 使用的默认分析器替换为将重音字符替换为相应的非重音字符的分析器。 Sitefinity CMS has an example: https://www.progress.com/documentation/sitefinity-cms/for-developers-search-with-accented-characters Sitefinity CMS 有一个例子: https ://www.progress.com/documentation/sitefinity-cms/for-developers-search-with-accented-characters

The default analyzer skips the special characters use the exact match query which will consider the special characters that you use.默认分析器使用完全匹配查询跳过特殊字符,该查询将考虑您使用的特殊字符。 https://lucenenet.apache.org/docs/3.0.3/d5/d58/class_lucene_1_1_net_1_1_search_1_1_phrase_query.html https://lucenenet.apache.org/docs/3.0.3/d5/d58/class_lucene_1_1_net_1_1_search_1_1_phrase_query.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM