简体   繁体   English

使用带有特殊字符的字符串搜索记录在 Lucene.Net 中不起作用

[英]Searching records using string with special characters not working in Lucene.Net

I am new to Lucene, here I am facing serious issues with lecene search .我是 Lucene 的新手,在这里我面临着 lecene search 的严重问题。 When searching records using string/string with numbers it's working fine.使用带数字的字符串/字符串搜索记录时,它工作正常。 But it does not bring any results when search the records using a string with special characters.但是当使用带有特殊字符的字符串搜索记录时,它不会带来任何结果。

ex: example - Brings results 
    'examples' - no result
    %example% - no result
    example2 - Brings results 
    @example - no  results

code:代码:

Indexing;索引;

_document.Add(new Field(dc.ColumnName, dr[dc.ColumnName].ToString(), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.YES));

Search Query :搜索查询:

Lucene.Net.Store.Directory _dir = Lucene.Net.Store.FSDirectory.Open(Config.Get(directoryPath));
Lucene.Net.Analysis.Analyzer analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30);

Query querySearch = queryParser.Parse("*" + searchParams.SearchForText + "*");
booleanQuery.Add(querySearch, Occur.MUST);

Can anyone help me to fix this.谁能帮我解决这个问题。

It appears there's work to be done.看来有工作要做。 I urge getting a good starter book on Lucene such as Lucene in Action, Second Edition (as you're using version 3).我强烈建议您购买一本关于 Lucene 的优秀入门书籍,例如Lucene in Action, Second Edition (因为您使用的是第 3 版)。 Although it targets Java examples are easily adapted to C#, and really the concepts are what matter most.尽管它针对的是 Java 示例,但它们很容易适应 C#,而实际上概念才是最重要的。

First, this:首先,这个:

"*" + searchParams.SearchForText + "*" "*" + searchParams.SearchForText + "*"

Don't do that .不要那样做 Leading wildcard searches are wildly inefficient and will suck an enormous amount of resources on a sizable index, doubly for leading and trailing wildcard search - what would happen if the query text was *e* ?前导通配符搜索效率极低,并且会在相当大的索引上消耗大量资源,对于前导和尾随通配符搜索会加倍 - 如果查询文本是*e*会发生什么?

There also seems to be more going on than shown in posted code as there is no reason not to be getting hits based on the inputs.似乎还有比发布的代码中显示的更多的事情,因为没有理由不根据输入获得点击。 The snippet below will produce the following in the console:下面的代码段将在控制台中生成以下内容:

Index terms:索引词:
example例子
example2例子2

raw text %example% as query text:example got 1 hits原始文本 %example% 作为查询文本:示例获得 1 次点击
raw text 'example' as query text:example got 1 hits原始文本“示例”作为查询文本:示例获得 1 次点击
raw text example as query text:example got 1 hits原始文本示例作为查询文本:示例获得 1 次点击
raw text @example as query text:example got 1 hits原始文本@example 作为查询文本:示例获得 1 次点击
raw text example2 as query text:example2 got 1 hits原始文本 example2 作为查询文本:example2 获得 1 个点击
Wildcard raw text example* as query text:example* got 2 hit(s)通配符原始文本示例* 作为查询文本:示例* 获得 2 个命中

See the Index Terms listing?看到索引术语列表了吗? NO 'special characters' land in the index because StandardAnalyzer removes them at index time - assuming StandardAnalyzer is used to index the field?索引中没有“特殊字符”,因为StandardAnalyzer在索引时删除它们 - 假设StandardAnalyzer用于索引字段?

I recommend running the snippet below in the debugger and observe what is happening.我建议在调试器中运行下面的代码片段并观察发生了什么。

public static void Example()
{
    var field_name = "text";
    var field_value = "%example% 'example' example @example example";
    var field_value2 = "example2";
    var luceneVer = Lucene.Net.Util.Version.LUCENE_30;

    using (var writer = new IndexWriter(new RAMDirectory(),
            new StandardAnalyzer(luceneVer), IndexWriter.MaxFieldLength.UNLIMITED)
            )
    {
        var doc = new Document();
        var field = new Field(
            field_name,
            field_value,
            Field.Store.YES,
            Field.Index.ANALYZED,
            Field.TermVector.YES
            );

        doc.Add(field);
        writer.AddDocument(doc);

        doc = new Document();
        field = new Field(
            field_name,
            field_value2,
            Field.Store.YES,
            Field.Index.ANALYZED,
            Field.TermVector.YES
            );

        doc.Add(field);
        writer.AddDocument(doc);
        writer.Commit();

        Console.WriteLine();
        // Show ALL terms in the index.
        using (var reader = writer.GetReader())
        {
            TermEnum terms = reader.Terms();
            Console.WriteLine("Index terms:");
            while (terms.Next())
            {
                Console.WriteLine("\t{0}", terms.Term.Text);
            }
        }

        // Search for each word in the original content @field_value
        using (var searcher = new IndexSearcher(writer.GetReader()))
        {
            string query_text;
            QueryParser parser;
            Query query;
            TopDocs topDocs;
            List<string> field_queries = new List<string>(field_value.Split(' '));
            field_queries.Add(field_value2);

            var analyzer = new StandardAnalyzer(luceneVer);
            while (field_queries.Count > 0)
            {
                query_text = field_queries[0];
                parser = new QueryParser(luceneVer, field_name, analyzer);
                query = parser.Parse(query_text);
                topDocs = searcher.Search(query, null, 100);
                Console.WriteLine();
                Console.WriteLine("raw text {0} as query {1} got {2} hit(s)",
                    query_text,
                    query,
                    topDocs.TotalHits
                    );
                field_queries.RemoveAt(0);
            }

            // Now do a wildcard query "example*"
            query_text = "example*";
            parser = new QueryParser(luceneVer, field_name, analyzer);
            query = parser.Parse(query_text);
            topDocs = searcher.Search(query, null, 100);
            Console.WriteLine();
            Console.WriteLine("Wildcard raw text {0} as query {1} got {2} hit(s)",
                query_text,
                query,
                topDocs.TotalHits
                );
        }
    }
}

If you need to perform exact matching, and index certain characters like %, then you'll need to use something other than StandardAnalyzer , perhaps a custom analyzer.如果您需要执行精确匹配,并索引某些字符(如 %),那么您将需要使用除StandardAnalyzer其他东西,可能是自定义分析器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM