简体   繁体   中英

Lucene net IndexWriter after UpdateDocument doubles the size of index even with optimize?

I'm creating the index in a normal way:

var directory = FSDirectory.Open(...);
var analyzer = ...

var indexWriter = new IndexWriter(directory, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED);
indexWriter.SetWriteLockTimeout(30000);

indexWriter.AddDocument(doc1);
indexWriter.AddDocument(doc2);
indexWriter.AddDocument(...);

indexWriter.Commit();
indexWriter.Optimize();
indexWriter.Close();

This creates an index of 5.8mb

Now I need to update 2 documents exactly..with 1 word added in each of them...so the size of index should be increased either by a very small amount or none at all:

var indexWriter = new IndexWriter(directory, analyzer, false, IndexWriter.MaxFieldLength.UNLIMITED);
indexWriter.SetWriteLockTimeout(30000);

indexWriter.UpdateDocument(doc1);
indexWriter.UpdateDocument(doc2);

indexWriter.Commit();
**indexWriter.Optimize();**
indexWriter.Close();

This operation DOUBLES the size of index in a way that it leaves _0.cfs file of the size the index was at previously 5.8mb ...and creates a whole new index of the same size in _2.xxx files...so for a two document with one word changes, it doubles it!

It also continues doing this if I repeat the operation...so it just doubles it forever.

My thoughts were that Optimize call should optimize it and not cause things like these?

How do I stop it from doubling my index?

Thnx!

This is usually caused by having IndexReaders/IndexSearchers opened on the index while you optimize. IndexReaders see a snapshot of the Index when they were opened so they keep a lock on the files and the IndexWriter cannot remove them when its closed.

After optmize, you should refresh IndexReaders/IndexSearchers either by re-creating them or by using the Reopen() method on IndexReader. Once the IndexReaders/IndexSearchers are refreshed, if you create an IndexWriter and Close it immediately, you should see the files disapear.

That being said, if you decide to optimize live indexes (which you should only do when you delete lots of documents), you should always expect the Index to temporarily grow 3X it's "normal" size.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM