I am using Lucene.NET and able to search get hit results as ScoreDoc[].
I need to know specific item position in ScoreDoc[]. All items in ScoreDoc[] are unique.
Sample code: luceneSearcher.Search(query, collector); ScoreDoc[] scores = collector.TopDocs().scoreDocs
For example, I need to get find item position in ScoreDoc[], which has custom ID property where value could be '99999'.
I can iterate through item in scores[] and check for ID property which matches '99999' then return the position, but this can have performance hit because scores[] can have thousands of items.
Is there any better technique?
Thanks
I came up with creating new ExtendedCollector which stores CollectedDocuments.
public class ExtendedCollector : Collector
{
private Scorer _scorer;
private Int32 _docBase;
private List<CollectedDocument> _documents;
public ExtendedCollector()
{
_documents = new List<CollectedDocument>();
}
public override void SetScorer(Scorer scorer)
{
_scorer = scorer;
}
public override void Collect(int doc)
{
var docId = _docBase + doc;
var score = _scorer.Score();
var currentDoc = _documents.FirstOrDefault(d => d.DocId == docId);
if (currentDoc == null)
_documents.Add(new CollectedDocument()
{DocId = docId, Score = score, OriginalIndex = _documents.Count, Index = _documents.Count});
else
currentDoc.Score = score;
}
public override void SetNextReader(IndexReader reader, int docBase)
{
_docBase = docBase;
}
public override bool AcceptsDocsOutOfOrder()
{
return false;
}
public List<CollectedDocument> Documents
{
get { return _documents; }
}
public List<CollectedDocument> DocumentsByScore
{
get
{
var result = _documents.OrderByDescending(d => d.Score).ToList();
var itemId = 0;
foreach (var collectedDocument in result)
{
itemId++;
collectedDocument.Index = itemId;
}
return result;
}
}
}
CollectedDocument looks like this
public class CollectedDocument
{
public Int32 DocId { get; set; }
public float Score { get; set; }
public int OriginalIndex { get; set; }
public int Index { get; set; }
}
Whenever you want to get results you would do
var myCollector = new ExtendedCollector();
searcher.Search(searchQuery, myCollector);
foreach (var doc in myCollector.Documents)
{
var docIndex = doc.Index; //this is the current index in a list
var originalIndex = doc.OriginalIndex; //this is item Id set when doc was collected
}
You can also get the documents ordered by score using
myCollector.DocumentsByScore
This might not be the easiest solution, but it works. If anyone has a better solution, please post it as I'd like to know that as well.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.