简体   繁体   中英

Searching Lucene.Net index for an url field

I want to search a Lucene.net index for a stored url field. My code is given below:

Field urlField = new Field("Url", url.ToLower(), Field.Store.YES,Field.Index.TOKENIZED);
document.Add(urlField);`
indexWriter.AddDocument(document);

I am using the above code for writing into the index.

And the below code to search the Url in the index.

Lucene.Net.Store.Directory _directory = FSDirectory.GetDirectory(Host, false);
IndexReader reader = IndexReader.Open(_directory);
KeywordAnalyzer _analyzer = new KeywordAnalyzer();
IndexSearcher indexSearcher = new IndexSearcher(reader);
QueryParser parser = new QueryParser("Url", _analyzer);
Query query = parser.Parse("\"" + downloadDoc.Uri.ToString() + "\"");
TopDocs hits = indexSearcher.Search(query, null, 10);
if (hits.totalHits > 0)
{
    //statements....
}

But whenever I search for a url for example: http://www.xyz.com/ , I am not getting any hits.

Somehow, figured out the alternative. But this works in case of only one document in the index. If there are more documents, the below code will not yield correct result. Any ideas? Pls help

While writing the index, use KeywordAnalyzer()

KeywordAnalyzer _analyzer = new KeywordAnalyzer();    
indexWriter = new IndexWriter(_directory, _analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED);

Then while searching also, use KeywordAnalyzer()

IndexReader reader = IndexReader.Open(_directory);
KeywordAnalyzer _analyzer = new KeywordAnalyzer();
IndexSearcher indexSearcher = new IndexSearcher(reader);
QueryParser parser = new QueryParser("Url", _analyzer);
Query query = parser.Parse("\"" + url.ToString() + "\"");                    
TopDocs hits = indexSearcher.Search(query, null, 1);

This is because the KeywordAnalyzer "Tokenizes" the entire stream as a single token.

Please help. Its urgent.

Cheers Sunil...

This worked for me:

 IndexReader reader = IndexReader.Open(_directory);                
 IndexSearcher indexSearcher = new IndexSearcher(reader);
 TermQuery tq= new TermQuery(new Term("Url", downloadDoc.Uri.ToString().ToLower()));                
 BooleanQuery bq = new BooleanQuery();
 bq.Add(tq, BooleanClause.Occur.SHOULD);
 TopScoreDocCollector collector = TopScoreDocCollector.create(10, true);

Use StandardAnalyzer while writing into the index.

This answer helped me: Lucene search by URL

try putting quotes around query, eg. like this :

"http://www.google.com/"

Using the whitespace or keyword analyzer should work.

Would anyone actually search for "http://www.Google.com"? Seems more likely that a user would search for "Google" instead.

You can always return the entire URL if their is a partial match. I think the standard analyzer should be more appropriate for searching and retrieving a URL.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM