简体   繁体   English

Apache POI 超出了 GC 开销限制

[英]GC overhead limit exceeded with Apache POI

I have 13 .xlsx files with about 1000 rows in each of them.我有 13 个 .xlsx 文件,每个文件大约有 1000 行。 Now I want to merge it to one .xlsx file with one sheet.现在我想用一张纸将它合并到一个 .xlsx 文件中。 I'm using code from here https://blog.sodhanalibrary.com/2014/11/merge-excel-files-using-java.html#.Vi9ns36rSUk .我正在使用https://blog.sodhanalibrary.com/2014/11/merge-excel-files-using-java.html#.Vi9ns36rSUk 中的代码。

Here's my code (few changes, addSheet method unchanged)这是我的代码(改动很少,addSheet 方法不变)

try {
        FileInputStream excellFile1 = new FileInputStream(new File("tmp_testOut1000.xlsx"));
        XSSFWorkbook workbook1 = new XSSFWorkbook(excellFile1);
        XSSFSheet sheet1 = workbook1.getSheetAt(0);

        for(int i = 2; i < 14; i++){
            FileInputStream excellFile2 = new FileInputStream(new File("tmp_testOut" + i + "000.xlsx"));
            XSSFWorkbook workbook2 = new XSSFWorkbook(excellFile2);
            XSSFSheet sheet2 = workbook2.getSheetAt(0);
            System.out.println("add " + i);
            addSheet(sheet1, sheet2);
        }
        
        excellFile1.close();

        // save merged file
        System.out.println("merging");
        File mergedFile = new File("merged.xlsx");
        if (!mergedFile.exists()) {
            mergedFile.createNewFile();
        }
        FileOutputStream out = new FileOutputStream(mergedFile);
        System.out.println("write");
        workbook1.write(out);
        out.close();
        System.out.println("Files were merged succussfully");
    } catch (Exception e) {
        e.printStackTrace();
    }

All files are loading and merging but after "write" sysout I'm getting所有文件都在加载和合并,但在“写入”系统输出后我得到

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.xmlbeans.impl.store.Xobj.new_cursor(Xobj.java:1829)
at org.apache.xmlbeans.impl.values.XmlObjectBase.newCursor(XmlObjectBase.java:293)
at org.apache.xmlbeans.impl.values.XmlComplexContentImpl.arraySetterHelper(XmlComplexContentImpl.java:1151)
at org.openxmlformats.schemas.spreadsheetml.x2006.main.impl.CTFontsImpl.setFontArray(Unknown Source)
at org.apache.poi.xssf.model.StylesTable.writeTo(StylesTable.java:424)
at org.apache.poi.xssf.model.StylesTable.commit(StylesTable.java:496)
at org.apache.poi.POIXMLDocumentPart.onSave(POIXMLDocumentPart.java:341)
at org.apache.poi.POIXMLDocumentPart.onSave(POIXMLDocumentPart.java:345)
at org.apache.poi.POIXMLDocument.write(POIXMLDocument.java:206)
at Start.main(Start.java:275)

What can I do?我能做什么? Why is this happening and how to prevent it?为什么会发生这种情况以及如何预防?

POI is notoriously memory-hungry, so running out of memory is not uncommon when handling large Excel-files.众所周知,POI 需要大量内存,因此在处理大型 Excel 文件时内存不足的情况并不少见。

When you are able to load all original files and only get trouble writing the merged file you could try using an SXSSFWorkbook instead of an XSSFWorkbook and do regular flushes after adding a certain amount of content (see poi-documentation of the org.apache.poi.xssf.streaming -package).当您能够加载所有原始文件并且仅在编写合并文件时遇到问题时,您可以尝试使用SXSSFWorkbook而不是XSSFWorkbook并在添加一定数量的内容后进行定期刷新(请参阅org.apache.poi.xssf.streaming poi 文档org.apache.poi.xssf.streaming )。 This way you will not have to keep the whole generated file in memory but only small portions.这样您就不必将整个生成的文件保存在内存中,而只保存一小部分。

Try allocating more memory eg.尝试分配更多内存,例如。

java -Xmx8192m

Also what you can try is to merge in one xlsx file at a time instead of loading them all at once.此外,您可以尝试一次合并一个 xlsx 文件,而不是一次加载它们。

You can also move this line into your for loop:您还可以将此行移动到您的 for 循环中:

excellFile1.close();

So you close it right away.所以你马上关闭它。

This issue occurs due to the below reason出现此问题是由于以下原因

The java.lang.OutOfMemoryError: GC overhead limit exceeded error is the JVM's way of signalling that your application spends too much time doing garbage collection with too little result. java.lang.OutOfMemoryError:GC 开销限制超出错误是 JVM 发出信号的方式,表明您的应用程序花费太多时间进行垃圾收集而结果太少。 By default the JVM is configured to throw this error if it spends more than 98% of the total time doing GC and when after the GC only less than 2% of the heap is recovered.默认情况下,如果 JVM 花费超过 98% 的总时间进行 GC 并且在 GC 之后仅恢复不到 2% 的堆时,JVM 被配置为抛出此错误。

if you just want to neglect this issue you can set the following vm options:如果您只想忽略这个问题,您可以设置以下虚拟机选项:

-XX:-UseGCOverheadLimit

Refer link on GC overhead for more information.有关更多信息,请参阅 有关 GC 开销的链接

You can also use the below switches to assign more heap memory to your application.您还可以使用以下开关为您的应用程序分配更多堆内存。 Run a pilot on your application for some time and identify how much memory would be better for your application在您的应用程序上运行一段时间并确定多少内存对您的应用程序更好

-Xms128m -Xmx512m(these switches sets the initial heap memory size to 128mb and Max memory to 512mb)

If you can avoid using the convenient but memory hungry workbook APIs, work instead with the streaming logic of processing data row by row, which is much more memory efficient.如果您可以避免使用方便但占用大量内存的工作簿 API,请改用逐行处理数据的流逻辑,这样可以提高内存效率。

In particular, pay particular attention to the usage of the: XSSFReader.SheetIterator for looping over the sheets.特别要注意使用: XSSFReader.SheetIterator 来循环工作表。

And finally take a good look at the usage of the API: XSSFSheetXMLHandler.最后好好看看API的用法:XSSFSheetXMLHandler。 For processing the rows withing a sheet.用于处理带有工作表的行。

See the code on this project: https://github.com/jeevatkm/excelReader/blob/master/src/main/java/com/myjeeva/poi/ExcelReader.java查看这个项目的代码: https : //github.com/jeevatkm/excelReader/blob/master/src/main/java/com/myjeeva/poi/ExcelReader.java

You define how you want to process each row by creating your own: new SheetContentsHandler....您可以通过创建自己的方式定义处理每一行的方式:new SheetContentsHandler....

This is quite like SAX parsing, it will not take a bit at your ram.这很像 SAX 解析,它不会占用您的内存。

 private void readSheet(StylesTable styles, ReadOnlySharedStringsTable sharedStringsTable, InputStream sheetInputStream) throws IOException, ParserConfigurationException, SAXException { SAXParserFactory saxFactory = SAXParserFactory.newInstance(); XMLReader sheetParser = saxFactory.newSAXParser().getXMLReader(); ContentHandler handler = new XSSFSheetXMLHandler(styles, sharedStringsTable, sheetContentsHandler, true); sheetParser.setContentHandler(handler); sheetParser.parse(new InputSource(sheetInputStream)); }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Apache POI autoColumnWidth java.lang.OutOfMemoryError:超出GC开销限制 - Apache POI autoColumnWidth java.lang.OutOfMemoryError: GC overhead limit exceeded Java-Apache poi导致java.lang.OutOfMemoryError:超出了GC开销限制 - Java - Apache poi leads to java.lang.OutOfMemoryError: GC overhead limit exceeded java.lang.OutOfMemoryError:使用Apache POI读取Excel文件时,超出了GC开销限制 - java.lang.OutOfMemoryError: GC overhead limit exceeded" while reading excel file using apache POI 超出了GC开销限制 - GC overhead limit exceeded java.lang.OutOfMemoryError:超出GC开销限制,通过Apache POI读取启用宏的Excel工作表时出错 - java.lang.OutOfMemoryError: GC overhead limit exceeded Error while Reading Macro enabled excel sheet through Apache POI 在Apache Spark Java中无法.collect(),OutOfMemoryError:超出了GC开销限制 - Cannot .collect() in Apache Spark Java, OutOfMemoryError: GC overhead limit exceeded Apache NiFi - OutOfMemory 错误:SplitText 处理器超出了 GC 开销限制 - Apache NiFi - OutOfMemory Error: GC overhead limit exceeded on SplitText processor 超出了Eclipse GC开销限制 - Eclipse GC overhead limit exceeded 超出smartgwt gc开销限制 - smartgwt gc overhead limit exceeded 超出Java GC开销限制 - Java GC overhead limit exceeded
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM