简体   繁体   English

Lucene索引搜索

[英]Lucene index searching

I am using Lucene indexing for the first time. 我第一次使用Lucene索引。 I have some documents in Hindi and English and I create index on the content of document.When I search the index I get result from all the documents even if my query is some english word it returns hindi document also. 我有印地文和英文的一些文档,并在文档内容上创建索引。当我搜索索引时,即使我的查询是某些英文单词,我也会从所有文档中获取结果,它也会返回印地文文档。 I have added the code below.please tell me where I am dong wrong. 我在下面添加了代码。请告诉我我哪里错了。

        IndexSearcher searcher = new IndexSearcher(directory);
        QueryParser parser = new QueryParser("Content", analyzer);



        while (condition)
        {
            Search(text, searcher, parser);

        }


        searcher.Close();
        private static void Search(string text, IndexSearcher searcher, QueryParse parser)
    {
        Query query = parser.Parse(text);

        Hits hits = searcher.Search(query);
        int results = hits.Length();

        for (int i = 0; i < results; i++)
        {
            Lucene.Net.Documents.Document doc = hits.Doc(i);

            string show = doc.ToString();

            float score = hits.Score(i);

            /* insert doc id in database table*/

            }

Thanks all 谢谢大家

First, I would use Luke to check whether my query syntax was right. 首先,我将使用Luke来检查查询语法是否正确。 Then I would check whether that the misbehaving English word is a homogram for a Hindi word (ie an English word that is spelled the same as a Hindi word). 然后,我将检查行为不正常的英语单词是否是印地语单词的同形字(即,拼写与印地语单词相同的英语单词)。

If you want to prevent a search for English search terms from coming up with Hindi documents, you will need to mark each document as to whether it is in English or Hindi, then specify that marking in your search query. 如果您要阻止英语搜索字词与北印度文文档一起出现,则需要标记每个文档是英语还是北印度文,然后在搜索查询中指定该标记。 In Query Parser Syntax, this could look like: 在查询解析器语法中,这可能类似于:

ENGLISHSEARCHTERMS +(language:English)

(where all Hindi documents have their language field set to 'Hindi' and all English documents have their language field set to 'English'). (其中所有印地文文档的语言字段都设置为“印地语”,而所有英文文档的语言字段都设置为“英语”)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM