简体   繁体   中英

Is Lucene updateDocument faster than deleting and then adding document?

I have a large index (about 100 GB) and I want to update documents in the index frequently. I'm in doubt between 2 methods:

1) Updating the document

2) Deleting the document and adding the updated version

Which one would be faster? Is there any other pros and cons!?

Regarding the Lucene API documentation, there should be no difference between updating a document or removing the old and adding the new one. Internally updating causes a remove and add operation:

In either case, documents are added with addDocument and removed with deleteDocuments(Term) or deleteDocuments(Query). A document can be updated with updateDocument (which just deletes and then adds the entire document). When finished adding, deleting and updating documents, close should be called. ( http://lucene.apache.org/core/4_6_0/core/org/apache/lucene/index/IndexWriter.html )

If you can batch your deletes and adds, the best practice is to first make all deletes and then do all adds. My tests on large indices proved that to me.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM