简体   繁体   English

使用Apache Lucene索引大型文件时,如何解决内存不足错误?

[英]How do I get around an out of memory error when indexing large files using Apache Lucene?

On line 195 of IndexFiles.java you will see: IndexFiles.java的第195行,您将看到:

 doc.add(new TextField("contents", new BufferedReader(new InputStreamReader(fis, StandardCharsets.UTF_8))));

This line allows for the user to search on file contents. 该行允许用户搜索文件内容。 If somebody wishes to display a summary along with the name of the matching file (kind of like Google search results) you need to add a some more lines of code after line 195 of IndexFiles.java as shown below: 如果有人希望显示摘要以及匹配文件的名称(类似Google搜索结果),则需要在IndexFiles.java的第195行之后添加一些代码行,如下所示:

FileReader fr = new FileReader("/home/user1/largefile.txt");
Bufferedreader  br = new BufferedReader(fr);

StringBuilder sb = new StringBuilder();
String line;

while ( (line = br.readLine()) != null){
   sb.append(line);
}

Field contentField = new StringField("content", sb.toString(), Field.Store.YES, Field.Index.ANALYZED); 

doc.add(contentField);

But I'm not done yet, I need to use Lucene's Highlighter class and add code after line 184 in SearchFiles.java . 但是我还没有完成,我需要使用Lucene的Highlighter类,并在SearchFiles.java的第184行之后添加代码。 More specifically something like: 更具体而言,例如:

Document doc = searcher.doc(hits[i].doc);
String text = doc.getField("content");
highlighter = new Highlighter(new QueryScorer());
String summary = highlighter.getBestFragment(analyzer, "content", text);

This code works perfectly and gives me the summary of search results. 该代码可以完美地工作,并为我提供搜索结果的摘要。 However, if the files are too big the IndexFiles.java class spits out an OutOfMemeory error while appending to the StringBuilder(). 但是,如果文件太大,则在附加到StringBuilder()时,IndexFiles.java类会吐出OutOfMemeory错误。 How do I get around this? 我该如何解决?

The problem is that the java heap is exhausted, by default the maximum java heap size is 64MB, but you can increase it using the option Xmx ie -Xmx1g which increase the maximum heap size to 1GB, take into account that the amount of memory for the heap can't get over of the size of RAM. 问题是Java堆已用尽,默认情况下,最大Java堆大小为64MB,但是您可以使用选项Xmx来增加它,即-Xmx1g,它会将最大堆大小增加到1GB,并考虑到堆不能超过RAM的大小。

If you want to know more take a look to this: 如果您想了解更多,请查看以下内容:

-X Command-line Options -X命令行选项

How is the default java heap size determined? 如何确定默认的Java堆大小?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM