简体   繁体   English

Lucene-内存不足错误

[英]Lucene - Out of Memory Error

I would like to store large amounts of file content (upwards of 75000 files with sizes around 5-100MB) in an index and run searches on it using Lucene 5. I'm using FSDirectory and I'm writing all file contents using an IndexWriter . 我想在索引中存储大量文件内容(多达75000个文件,大小约5-100MB),并使用Lucene 5对其进行搜索。我正在使用FSDirectory并且正在使用IndexWriter写入所有文件内容。 As more files are written the memory usage increases until eventually an Out of Memory exception is thrown. 随着写入更多文件,内存使用量会增加,直到最终抛出Out of Memory异常为止。

Here is an example of how I'm currently doing this. 这是我目前如何执行此操作的示例。

Analyzer analyzer = new StandardAnalyzer();
FSDirectory directory = FSDirectory.open(indexFilePath);
DirectoryReader reader = DirectoryReader.open(directory);   

IndexWriterConfig config = new IndexWriterConfig(analyzer);

IndexWriter writer = new IndexWriter(directory, config);

for (Document document : documents)
{
    writer.addDocument(document);
}

writer.close();

I've been changing options around like these for the config but I've noticed no differences. 我一直在为配置更改这些选项,但是我没有发现任何区别。

config.setMaxBufferedDocs(2);
config.setRAMBufferSizeMB(32);
config.setRAMPerThreadHardLimitMB(32);

I've also tried committing, flushing, and forcing merges with the writer but this doesn't affect it. 我也尝试过与编写器进行提交,刷新和强制合并,但这并不影响它。

Is it possible to lower/limit the memory usage of Lucene? 是否可以降低/限制Lucene的内存使用量?

You can Perform the lucene data indexing chunk by chunk. 您可以逐块执行Lucene数据索引。 If you are full data indexing, perform the first chunk data indexing in IndexWriterConfig CREATE mode. 如果您是完整数据索引,请在IndexWriterConfig CREATE模式下执行第一个块数据索引。

 config.setOpenMode(OpenMode.CREATE);

For indexing remaining chunks of data, set the IndexWriterConfig mode to CREATE_OR_APPEND 要为剩余的数据块建立索引,请将IndexWriterConfig模式设置为CREATE_OR_APPEND

config.setOpenMode(OpenMode.CREATE_OR_APPEND);

This will perform incremental indexing, by appending the current data set to the existing lucene index. 通过将当前数据集附加到现有的lucene索引,这将执行增量索引。

Call these methods in each incremental indexing/chunk data indexing. 在每个增量索引/块数据索引中调用这些方法。

writer.optimize();
writer.commit();
writer.close(); 

TieredMergePolicy configuration can also be set explicitly only in the case of incremental indexing, for reflecting the delete, modidfication or addition of records to the index immediately on the search 也可以仅在增量索引的情况下显式设置TieredMergePolicy配置,以在搜索时立即将记录的删除,修改或添加反映到索引中

TieredMergePolicy t  = new TieredMergePolicy();
t.setForceMergeDeletesPctAllowed(.01);
config.setMergePolicy(t);

writer.forceMergeDeletes();
writer.commit();

This is the way of doing indexing chunk by chunk. 这是逐块索引的方式。 Since we are doing chunk by chunk. 由于我们正在逐块进行。 This will release the memory in each chunk. 这将释放每个块中的内存。

Lucene indexing may or may not be the root cause of out of memory issue. Lucene索引可能是也可能不是内存不足问题的根本原因。 Use Memory Analyzer tool to check which all java objects are not getting garbage collected in memory causing out of memory issue. 使用Memory Analyzer tool检查哪些Java对象没有在内存中收集到垃圾,从而导致内存不足问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM