简体   繁体   中英

Lucene docID reliability

Hi
If only insert operation occur on lucene index (no delete/update), is it true that docID is not changing ? and its also reliable
if it is true, i want to use it as loading FieldCache incrementally to lower down the overhead of loading all documents, what is the best solution for that ??

I'm not quite sure what you're planning to do with the field cache, but my understanding of document ids is that they can change during an insert, depending on pending deletes, merge policies etc.

ie Document ID should not be used past a commit boundary on a reopened index reader

Hope this helps,

The document id is static within a segment. IndexReader.Open (usually) opens a DirectoryReader which combines several SegmentReader . You'll need to pass the "bottom" reader to the FieldCache for the population to work correctly.

Here's an example from FieldCache with frequently updating index which ensures that only the newly read segment is read by the FieldCache, instead of the topmost reader (which will considered changed at every commit).

var directory = FSDirectory.Open(new DirectoryInfo("index"));
var reader = IndexReader.Open(directory, readOnly: true);
var documentId = 1337;

// Grab all subreaders.
var subReaders = new List<IndexReader>();
ReaderUtil.GatherSubReaders(subReaders, reader);

// Loop through all subreaders. While subReaderId is higher than the
// maximum document id in the subreader, go to next.
var subReaderId = documentId;
var subReader = subReaders.First(sub => {
    if (sub.MaxDoc() < subReaderId) {
        subReaderId -= sub.MaxDoc();
        return false;
    }

    return true;
});

var values = FieldCache_Fields.DEFAULT.GetInts(subReader, "newsdate");
var value = values[subReaderId];

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM