简体繁体 English

Lucene从3.x迁移到4.1.0和索引优化

[英]Lucene migration from 3.x to 4.1.0 and index optimisation

原文 2013-01-30 08:11:00 7 2 java/ migration/ lucene

I have migrated from lucene 3.x to 4.1.0. 我已经从Lucene 3.x迁移到4.1.0。 After creating new index I realise there is much more files in the index directory. 创建新索引后，我意识到索引目录中还有更多文件。 lucene 3 uses IndexWriter.optimize() to collapse files. lucene 3使用IndexWriter.optimize（）折叠文件。 The succesor in v4 is IndexWriter.forceMerge(int maxNumSegments) . v4中的继任者是IndexWriter.forceMerge（int maxNumSegments）。 I have tried forceMerge with different values for maxNumSegments and I get always the same index files. 我尝试使用maxNumSegments的不同值的forceMerge，并且得到的索引文件始终相同。 I expect the files to be merge together into one, or at least less, index files. 我希望将文件合并到一个或至少更少的索引文件中。 Am I wrong? 我错了吗？ Do you know how to do it? 你知道怎么做吗？

2 个解决方案

Apart from ideological (less files better than more), are there any practical reasons why do you need less files? 除了意识形态（文件少而不是文件多）之外，是否有任何实际原因导致您需要更少的文件？ Providing the overall number of bytes for given index is roughly the same, what's the difference? 提供给定索引的字节总数大致相同，有什么区别？

The reason why optimization was removed because it was inefficient: it would kill search performance, result load spikes, etc. Performance over searching over multiple segments has improved and the need to .optimize() is no longer justifiable. 之所以取消优化是因为它效率低下：之所以会降低搜索性能，导致结果负载突增，等等。对多个段进行搜索的性能得到了改善，不再需要使用.optimize() 。 Lucene now uses TieredMergePolicy instead which nicely balances the load and solves this problem from a different angle. Lucene现在使用TieredMergePolicy代替，它很好地平衡了负载并从另一个角度解决了这个问题。

Maybe you are looking for Lucene's compound file format which stores all logical index files in a single actual file. 也许您正在寻找Lucene的复合文件格式，该格式将所有逻辑索引文件存储在一个实际文件中。 See MergePolicy.setUseCompoundFile(true) . 请参阅MergePolicy.setUseCompoundFile（true）。