简体   繁体   English

在Lucene.Net索引中搜索URL字段

[英]Searching Lucene.Net index for an url field

I want to search a Lucene.net index for a stored url field. 我想在Lucene.net索引中搜索存储的URL字段。 My code is given below: 我的代码如下:

Field urlField = new Field("Url", url.ToLower(), Field.Store.YES,Field.Index.TOKENIZED);
document.Add(urlField);`
indexWriter.AddDocument(document);

I am using the above code for writing into the index. 我正在使用上面的代码写入索引。

And the below code to search the Url in the index. 和下面的代码来搜索索引中的Url。

Lucene.Net.Store.Directory _directory = FSDirectory.GetDirectory(Host, false);
IndexReader reader = IndexReader.Open(_directory);
KeywordAnalyzer _analyzer = new KeywordAnalyzer();
IndexSearcher indexSearcher = new IndexSearcher(reader);
QueryParser parser = new QueryParser("Url", _analyzer);
Query query = parser.Parse("\"" + downloadDoc.Uri.ToString() + "\"");
TopDocs hits = indexSearcher.Search(query, null, 10);
if (hits.totalHits > 0)
{
    //statements....
}

But whenever I search for a url for example: http://www.xyz.com/ , I am not getting any hits. 但是每当我搜索一个网址时,例如: http://www.xyz.com/ : http://www.xyz.com/ ,我都不会获得任何点击。

Somehow, figured out the alternative. 不知何故,想出了替代方案。 But this works in case of only one document in the index. 但这仅适用于索引中只有一个文档的情况。 If there are more documents, the below code will not yield correct result. 如果还有更多文档,则下面的代码将不会产生正确的结果。 Any ideas? 有任何想法吗? Pls help 请帮助

While writing the index, use KeywordAnalyzer() 编写索引时,请使用KeywordAnalyzer()

KeywordAnalyzer _analyzer = new KeywordAnalyzer();    
indexWriter = new IndexWriter(_directory, _analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED);

Then while searching also, use KeywordAnalyzer() 然后在搜索的同时,使用KeywordAnalyzer()

IndexReader reader = IndexReader.Open(_directory);
KeywordAnalyzer _analyzer = new KeywordAnalyzer();
IndexSearcher indexSearcher = new IndexSearcher(reader);
QueryParser parser = new QueryParser("Url", _analyzer);
Query query = parser.Parse("\"" + url.ToString() + "\"");                    
TopDocs hits = indexSearcher.Search(query, null, 1);

This is because the KeywordAnalyzer "Tokenizes" the entire stream as a single token. 这是因为KeywordAnalyzer将整个流“标记化”为单个标记。

Please help. 请帮忙。 Its urgent. 这非常紧急。

Cheers Sunil... 干杯Sunil ...

This worked for me: 这对我有用:

 IndexReader reader = IndexReader.Open(_directory);                
 IndexSearcher indexSearcher = new IndexSearcher(reader);
 TermQuery tq= new TermQuery(new Term("Url", downloadDoc.Uri.ToString().ToLower()));                
 BooleanQuery bq = new BooleanQuery();
 bq.Add(tq, BooleanClause.Occur.SHOULD);
 TopScoreDocCollector collector = TopScoreDocCollector.create(10, true);

Use StandardAnalyzer while writing into the index. 写入索引时使用StandardAnalyzer。

This answer helped me: Lucene search by URL 这个答案对我有帮助: Lucene通过URL搜索

try putting quotes around query, eg. 尝试在查询周围加上引号,例如。 like this : 像这样 :

"http://www.google.com/" “http://www.google.com/”

Using the whitespace or keyword analyzer should work. 使用空格或关键字分析器应该可以。

Would anyone actually search for "http://www.Google.com"? 会有人实际搜索“ http://www.Google.com”吗? Seems more likely that a user would search for "Google" instead. 似乎用户更有可能搜索“ Google”。

You can always return the entire URL if their is a partial match. 如果部分匹配,则始终可以返回整个URL。 I think the standard analyzer should be more appropriate for searching and retrieving a URL. 我认为标准分析器应该更适合搜索和检索URL。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM