简体   繁体   中英

Does Lucene store the actual documents in its index?

I am planning to use Lucene to index a very large corpus of text documents. I know how an inverted index and all that works.

Question: Does Lucene store the actual source documents in its index (in addition to the terms)? So if I search for a term and want all the documents that contain the term, do the documents come out of Lucene, or does Lucene just return pointers (eg the file path to the matched documents)?

This is up to you. Lucene represents documents as collections of fields, and for each field you can configure whether it is stored. Typically, you would store the title fields, but not the body fields, when handling largish documents, and you'd add an identifier field (not indexed) that can be used to retrieve the actual document.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM