简体   繁体   English

从Lucene的查询中检索所有匹配文档的最有效方法是什么?

[英]What's the most efficient way to retrieve all matching documents from a query in Lucene, unsorted?

I am looking to perform a query for the purposes of maintaining internal integrity; 我希望执行查询以保持内部完整性; for example, removing all traces of a particular field/value from the index. 例如,从索引中删除特定字段/值的所有跟踪。 Therefore it's important that I find all matching documents (not just the top n docs), but the order they are returned in is irrelevant. 因此,重要的是我找到所有匹配的文档(不仅仅是前n个文档),但它们返回的顺序是无关紧要的。

According to the docs, it looks like I need to use the Searcher.Search( Query, Collector ) method, but there's no built in Collector class that does what I need. 根据文档,看起来我需要使用Searcher.Search( Query, Collector )方法,但是没有内置的Collector类可以满足我的需要。

Should I derive my own Collector for this purpose? 我应该为此目的派生自己的收藏家吗? What do I need to keep in mind when doing that? 这样做时我需要记住什么?

Turns out this was a lot easier than I expected. 事实证明,这比我预期的要容易得多。 I just used the example implementation at http://lucene.apache.org/java/2_9_0/api/core/org/apache/lucene/search/Collector.html and recorded the doc numbers passed to the Collect() method in a List, exposing this as a public Docs property. 我刚刚在http://lucene.apache.org/java/2_9_0/api/core/org/apache/lucene/search/Collector.html上使用了示例实现,并记录了传递给Collect()方法的文档编号。列表,将其公开为公共Docs属性。

I then simply iterate this property, passing the number back to the Searcher to get the proper Document : 然后我简单地迭代这个属性,将数字传递给Searcher以获取正确的Document

var searcher = new IndexSearcher( reader );
var collector = new IntegralCollector(); // my custom Collector
searcher.Search( query, collector );
var result = new Document[ collector.Docs.Count ];
for ( int i = 0; i < collector.Docs.Count; i++ )
    result[ i ] = searcher.Doc( collector.Docs[ i ] );
searcher.Close(); // this is probably not needed
reader.Close();

So far it seems to be working fine in preliminary tests. 到目前为止,它似乎在初步测试中运行良好。

Update: Here's the code for IntegralCollector : 更新:这是IntegralCollector的代码:

internal class IntegralCollector: Lucene.Net.Search.Collector {
    private int _docBase;

    private List<int> _docs = new List<int>();
    public List<int> Docs {
        get { return _docs; }
    }

    public override bool AcceptsDocsOutOfOrder() {
        return true;
    }

    public override void Collect( int doc ) {
        _docs.Add( _docBase + doc );
    }

    public override void SetNextReader( Lucene.Net.Index.IndexReader reader, int docBase ) {
        _docBase = docBase;
    }

    public override void SetScorer( Lucene.Net.Search.Scorer scorer ) {
    }
}

No need to write a hit collector if you're just looking to get all the Document objects in the index. 如果您只是想要获取索引中的所有Document对象,则无需编写命中收集器。 Just loop from 0 to maxDoc() and call reader.document() on each doc id, making sure to skip documents that are already deleted: 只需从0循环到maxDoc()并在每个doc id上调用reader.document(),确保跳过已删除的文档:

for (int i=0; i<reader.maxDoc(); i++) {
   if (reader.isDeleted(i))
      continue;
   results[i] = reader.document(i);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从键列表中检索字典的所有元素的最有效方法? - Most efficient way to retrieve all element of a Dictionary from a list of keys? 将行为注入我所有实体的最有效方法是什么? - What's the most efficient way to inject behavior into all my entities? 使用 Smartsheet API 从每张工作表中的任何行获取所有附件的最有效方法是什么? - What's the most efficient way to get all attachments from any row in every sheet using Smartsheet API? 清除数组的最有效方法是什么? - What's the most efficient way to clear an array? 克隆 Office Open XML 文档的最有效方法是什么? - What is the most efficient way to clone Office Open XML documents? 生成缩略图的最有效方法是什么? - What's the most efficient way to generate thumbnails? 按字母顺序排序的最有效方法是什么? - What's the most efficient way to sort this alphabetically? 从文件夹列表中检索一个文件路径的最有效方法 - Most efficient way to retrieve one file path from a list of folders 从表存储中检索一系列分区键的最有效方法 - Most efficient way to retrieve from table storage a range of partition keys 从同一来源生成多个图像的最有效方法是什么? C#/Asp.Net - What's the most efficient way to generate multiple images from the same source? C# / Asp.Net
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM