简体   繁体   English

EXcel Sheet POI 验证:内存不足错误

[英]EXcel Sheet POI Validation : Out Of Memory Error

I am trying to validate an excel file using java before dumping it to database.我正在尝试在将其转储到数据库之前使用 java 验证 excel 文件。

Here is my code snippet which causes error.这是我的代码片段,它导致错误。

try {
        fis = new FileInputStream(file);
        wb = new XSSFWorkbook(fis);
        XSSFSheet sh = wb.getSheet("Sheet1");
        for(int i = 0 ; i < 44 ; i++){
            XSSFCell a1 = sh.getRow(1).getCell(i);
            printXSSFCellType(a1);
        }

    } catch (FileNotFoundException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
    catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

Here is the error which i get这是我得到的错误

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.ArrayList.<init>(Unknown Source)
    at java.util.ArrayList.<init>(Unknown Source)
    at org.apache.xmlbeans.impl.values.NamespaceContext$NamespaceContextStack.<init>(NamespaceContext.java:78)
    at org.apache.xmlbeans.impl.values.NamespaceContext$NamespaceContextStack.<init>(NamespaceContext.java:75)
    at org.apache.xmlbeans.impl.values.NamespaceContext.getNamespaceContextStack(NamespaceContext.java:98)
    at org.apache.xmlbeans.impl.values.NamespaceContext.push(NamespaceContext.java:106)
    at org.apache.xmlbeans.impl.values.XmlObjectBase.check_dated(XmlObjectBase.java:1273)
    at org.apache.xmlbeans.impl.values.XmlObjectBase.stringValue(XmlObjectBase.java:1484)
    at org.apache.xmlbeans.impl.values.XmlObjectBase.getStringValue(XmlObjectBase.java:1492)
    at org.openxmlformats.schemas.spreadsheetml.x2006.main.impl.CTCellImpl.getR(Unknown Source)
    at org.apache.poi.xssf.usermodel.XSSFCell.<init>(XSSFCell.java:105)
    at org.apache.poi.xssf.usermodel.XSSFRow.<init>(XSSFRow.java:70)
    at org.apache.poi.xssf.usermodel.XSSFSheet.initRows(XSSFSheet.java:179)
    at org.apache.poi.xssf.usermodel.XSSFSheet.read(XSSFSheet.java:143)
    at org.apache.poi.xssf.usermodel.XSSFSheet.onDocumentRead(XSSFSheet.java:130)
    at org.apache.poi.xssf.usermodel.XSSFWorkbook.onDocumentRead(XSSFWorkbook.java:286)
    at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:159)
    at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:207)
    at com.xls.validate.ExcelValidator.main(ExcelValidator.java:79)

This works perfectly fine when xlsx file is less than 1 MB.当 xlsx 文件小于 1 MB 时,这工作得很好。

I understand this is because my xlsx file is around 5-10 MB and POI tries to load the entire sheet at once in JVM Memory我明白这是因为我的 xlsx 文件大约 5-10 MB 并且 POI 尝试在 JVM 内存中一次加载整个工作表

What can be a possible workaround?什么是可能的解决方法?

Please help.请帮忙。

Thanks in Advance!提前致谢!

There are two options available to you.有两种选择可供您选择。 Option #1 - increase the size of your JVM Heap, so that Java has more memory available to it.选项 #1 - 增加 JVM 堆的大小,以便 Java 有更多可用内存。 Processing Excel files in POI using the UserModel code is DOM based, so the whole file (including parsed form) needs to be buffered into memory.使用 UserModel 代码处理 POI 中的 Excel 文件是基于 DOM 的,因此需要将整个文件(包括解析的表单)缓存到内存中。 Try a question like this one for advice on how to increase the help.尝试类似这样问题以获取有关如何增加帮助的建议。

Option #2, which is more work - switch to event based (SAX) processing.选项 #2,这是更多的工作 - 切换到基于事件 (SAX) 处理。 This only processes part of the file at a time, so needs much much less memory.这一次只处理文件的一部分,因此需要的内存要少得多。 However, it requires more work from you, which is why you might be better throwing a few more GB of memory at the problem - memory is cheap while programmers aren't!然而,它需要你做更多的工作,这就是为什么你最好在这个问题上多投入几 GB 的内存——内存很便宜,而程序员不是! The SpreadSheet howto page has instructions on how to do SAX parsing of .xlsx files, and there are various example files provided by POI you can look at for advice. SpreadSheet howto 页面提供了有关如何对 .xlsx 文件进行 SAX 解析的说明,POI 提供了各种示例文件,您可以查看以获取建议。

. .

Also, another thing - you seem to be loading a File via a stream, which is bad as it means even more stuff needs buffering into memory.此外,另一件事 - 您似乎正在通过流加载文件,这很糟糕,因为这意味着更多的东西需要缓冲到内存中。 See the POI Documentation for more on this , including instructions on how to work with the File directly.有关更多信息,请参阅POI 文档,包括有关如何直接使用文件的说明。

Use Event API (HSSF Only) .使用Event API (HSSF Only)

The event API is newer than the User API.事件 API 比用户 API 更新。 It is intended for intermediate developers who are willing to learn a little bit of the low level API structures.它适用于愿意学习一些低级 API 结构的中级开发人员。 Its relatively simple to use, but requires a basic understanding of the parts of an Excel file (or willingness to learn).它使用起来相对简单,但需要对 Excel 文件的各个部分有基本的了解(或愿意学习)。 The advantage provided is that you can read an XLS with a relatively small memory footprint .提供的优点是您可以读取占用内存相对较小的 XLS

You can use SXSSF workbook from POI for memory related issues.您可以使用 POI 中的 SXSSF 工作簿来解决与内存相关的问题。 Refer here参考 这里

I faced the similar issue while reading and merging multiple CSVs into a single XLSX file.我在读取多个 CSV 并将其合并到一个 XLSX 文件中时遇到了类似的问题。 I had a total of 3 csv sheets each with 30k rows totalling to 90k.我总共有 3 个 csv 表,每个表有 30k 行,总计 90k。

It got resolved by using SXSFF as below,它通过使用 SXSFF 得到解决,如下所示,

    public static void mergeCSVsToXLSX(Long jobExecutionId, Map<String, String> csvSheetNameAndFile, String xlsxFile) {
    try (SXSSFWorkbook wb = new SXSSFWorkbook(100);) { // keep 100 rows in memory, exceeding rows will be flushed to
                                                       // disk
      csvSheetNameAndFile.forEach((sheetName, csv) -> {
        try (CSVReader reader = new CSVReader(new FileReader(csv))) {
          wb.setCompressTempFiles(true);
          SXSSFSheet sheet = wb.createSheet(sheetName);
          sheet.setRandomAccessWindowSize(100);

          String[] nextLine;
          int r = 0;
          while ((nextLine = reader.readNext()) != null) {
            Row row = sheet.createRow((short) r++);
            for (int i = 0; i < nextLine.length; i++) {
              Cell cell = row.createCell(i);
              cell.setCellValue(nextLine[i]);
            }
          }
        } catch (IOException ioException) {
          logger.error("Error in reading CSV file {} for jobId {} with exception {}", csv, jobExecutionId,
              ioException.getMessage());
        }
      });

      FileOutputStream out = new FileOutputStream(xlsxFile);
      wb.write(out);
      wb.dispose();
    } catch (IOException ioException) {
      logger.error("Error in creating workbook for jobId {} with exception {}", jobExecutionId,
          ioException.getMessage());
    }
  }

Well, here's a link with some detailed info about your error, and how to fix it: http://javarevisited.blogspot.com/2011/09/javalangoutofmemoryerror-permgen-space.html?m=1 .好吧,这里有一个链接,其中包含有关您的错误的一些详细信息以及如何修复它: http : //javarevisited.blogspot.com/2011/09/javalangoutofmemoryerror-permgen-space.html?m=1

Well, let me try to explain your error:好吧,让我试着解释你的错误:

The java.lang.OutOfMemoryError has two variants. java.lang.OutOfMemoryError有两个变体。 One in the Java Heap Space, and the other in PermGen Space.一个在 Java 堆空间中,另一个在永久代空间中。

Your error could be caused by a memory leak, a low amount of system RAM, or very little RAM allocated to the Java Virtual Machine.您的错误可能是由内存泄漏、系统 RAM 量过少或分配给 Java 虚拟机的 RAM 太少引起的。

The difference between the Java Heap Space and PermGen Space variants is that PermGen Space stores pools of Strings and data on the primitive types, such as int, as well as how to read methods and classes, the Java Heap Space works differently. Java Heap Space 和 PermGen Space 变体之间的区别在于 PermGen Space 存储字符串池和基本类型(例如 int)的数据,以及如何读取方法和类,Java Heap Space 的工作方式不同。 So if you have a lot of strings or classes in your project, and not enough allocated/system RAM, you will get an OutOfMemoryError.因此,如果您的项目中有很多字符串或类,而分配的/系统 RAM 不足,您将收到 OutOfMemoryError。 The default amount of RAM the JVM allocates to PermGen is 64 MB, which is quite a small bit of memory space. JVM 分配给 PermGen 的默认 RAM 量为 64 MB,这是相当小的内存空间。 The linked article explains much more about this error and provides detailed information about how to fix this.链接的文章解释了有关此错误的更多信息,并提供了有关如何解决此问题的详细信息。

Hope this helps!希望这可以帮助!

I too faced the same issue of OOM while parsing xlsx file...after two days of struggle, I finally found out the below code that was really perfect;我在解析xlsx文件时也遇到了同样的OOM问题......经过两天的挣扎,我终于找到了下面的代码,非常完美;

This code is based on sjxlsx.此代码基于 sjxlsx。 It reads the xlsx and stores in a HSSF sheet.它读取 xlsx 并存储在 HSSF 表中。

           [code=java] 
            // read the xlsx file
       SimpleXLSXWorkbook = new SimpleXLSXWorkbook(new File("C:/test.xlsx"));

        HSSFWorkbook hsfWorkbook = new HSSFWorkbook();

        org.apache.poi.ss.usermodel.Sheet hsfSheet = hsfWorkbook.createSheet();

        Sheet sheetToRead = workbook.getSheet(0, false);

        SheetRowReader reader = sheetToRead.newReader();
        Cell[] row;
        int rowPos = 0;
        while ((row = reader.readRow()) != null) {
            org.apache.poi.ss.usermodel.Row hfsRow = hsfSheet.createRow(rowPos);
            int cellPos = 0;
            for (Cell cell : row) {
                if(cell != null){
                    org.apache.poi.ss.usermodel.Cell hfsCell = hfsRow.createCell(cellPos);
                    hfsCell.setCellType(org.apache.poi.ss.usermodel.Cell.CELL_TYPE_STRING);
                    hfsCell.setCellValue(cell.getValue());
                }
                cellPos++;
            }
            rowPos++;
        }
        return hsfSheet;[/code]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM