简体   繁体   English

UpdateDocument之后的Lucene net IndexWriter即使使用optimize也会将索引的大小加倍?

[英]Lucene net IndexWriter after UpdateDocument doubles the size of index even with optimize?

I'm creating the index in a normal way: 我正在以正常方式创建索引:

var directory = FSDirectory.Open(...);
var analyzer = ...

var indexWriter = new IndexWriter(directory, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED);
indexWriter.SetWriteLockTimeout(30000);

indexWriter.AddDocument(doc1);
indexWriter.AddDocument(doc2);
indexWriter.AddDocument(...);

indexWriter.Commit();
indexWriter.Optimize();
indexWriter.Close();

This creates an index of 5.8mb 这会创建一个5.8mb的索引

Now I need to update 2 documents exactly..with 1 word added in each of them...so the size of index should be increased either by a very small amount or none at all: 现在我需要准确地更新2个文件...每个文件中都添加了1个单词...所以索引的大小应该增加很少量或根本不增加:

var indexWriter = new IndexWriter(directory, analyzer, false, IndexWriter.MaxFieldLength.UNLIMITED);
indexWriter.SetWriteLockTimeout(30000);

indexWriter.UpdateDocument(doc1);
indexWriter.UpdateDocument(doc2);

indexWriter.Commit();
**indexWriter.Optimize();**
indexWriter.Close();

This operation DOUBLES the size of index in a way that it leaves _0.cfs file of the size the index was at previously 5.8mb ...and creates a whole new index of the same size in _2.xxx files...so for a two document with one word changes, it doubles it! 此操作以索引大小为_0.cfs文件的方式重复索引的大小,该文件的大小为索引之前的5.8mb ...并在_2.xxx文件中创建一个相同大小的全新索引...所以对于一个单词改变的两个文档,它加倍!

It also continues doing this if I repeat the operation...so it just doubles it forever. 如果我重复这个操作,它也会继续这样做...所以它只是将它永久地加倍。

My thoughts were that Optimize call should optimize it and not cause things like these? 我的想法是优化调用应该优化它而不会导致这样的事情?

How do I stop it from doubling my index? 如何阻止它加倍我的索引?

Thnx! 日Thnx!

This is usually caused by having IndexReaders/IndexSearchers opened on the index while you optimize. 这通常是由于在优化时在索引上打开了IndexReaders / IndexSearchers。 IndexReaders see a snapshot of the Index when they were opened so they keep a lock on the files and the IndexWriter cannot remove them when its closed. IndexReaders在打开时会看到索引的快照,因此它们会锁定文件,并且IndexWriter在关闭时无法删除它们。

After optmize, you should refresh IndexReaders/IndexSearchers either by re-creating them or by using the Reopen() method on IndexReader. 在optmize之后,您应该通过重新创建它们或使用IndexReader上的Reopen()方法刷新IndexReaders / IndexSearchers。 Once the IndexReaders/IndexSearchers are refreshed, if you create an IndexWriter and Close it immediately, you should see the files disapear. 一旦IndexReaders / IndexSearchers刷新,如果您创建一个IndexWriter并立即关闭它,您应该看到文件消失。

That being said, if you decide to optimize live indexes (which you should only do when you delete lots of documents), you should always expect the Index to temporarily grow 3X it's "normal" size. 话虽如此,如果你决定优化实时索引(你应该只删除大量文档时),你应该总是期望索引暂时增长3倍,这是“正常”大小。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM