简体   繁体   English

Java - 使用 Apache POI 写入大型 Excel 文件时出现 OutOfMemoryError

[英]Java - OutOfMemoryError when writing large Excel file with Apache POI

I am getting a java outofmemory error.我收到 java 内存不足错误。 I added the necessary java arguments, but I still keep getting this error.我添加了必要的 java arguments,但我仍然不断收到此错误。 I shared the libraries and functions that I use.我分享了我使用的库和函数。 The function It gives this error when converting a large csv file (about 15mb) into an xlsx file. function 将大型 csv 文件(约 15mb)转换为 xlsx 文件时出现此错误。 It working properly on small files without any errors.它在小文件上正常工作,没有任何错误。 How can i fix this error?我该如何解决这个错误? Thanks.谢谢。

I added these java args on Intellij Idea我在 Intellij Idea 上添加了这些 java 参数

I got error我有错误

I use this libraries我使用这个库

Main主要的

public class Main {

    public static void main(String[] args) {

        convert_CSV_to_XLSX(S.CSV_PATH,S.XLSX_PATH,"Sheet");

    }

}

Convert CSV to XLSX将 CSV 转换为 XLSX

public void convert_CSV_to_XLSX(String inputFilePath, String outputFilePath, String sheetName) {
        try {
            ArrayList<ArrayList<Object>> csvObjectsAll = readCSV(inputFilePath);
            writeXLSX_horizontally(outputFilePath, csvObjectsAll, sheetName);
        } catch (Exception e) {
            e.printStackTrace();
        }
}

ReadCSV读取CSV

public ArrayList<ArrayList<Object>> readCSV(String inputFilePath) {
        ArrayList<ArrayList<Object>> gal = new ArrayList<>();
        try {
            String csvStr = new String(Files.readAllBytes(Paths.get(inputFilePath)), StandardCharsets.UTF_8);
            for (String str : csvStr.split("\n")) {
                ArrayList<Object> csvLinesSplit = new ArrayList<>();
                String ss = str.replaceAll("\"", "");
                if (ss.charAt(ss.length() - 1) == ',') {
                    ss += "$";
                }
                for (String s : ss.split(",")) {
                    if (s.equals("") || s.equals("$")) {
                        csvLinesSplit.add("");
                    } else {
                        csvLinesSplit.add(s);
                    }
                }
                gal.add(csvLinesSplit);
            }
        } catch (Exception e) {

        }
        return gal;
}

Write XLSX写 XLSX

public void writeXLSX_horizontally(String outputFileName, ArrayList<ArrayList<Object>> gdl, String sheetName) {

        XSSFWorkbook workbook = new XSSFWorkbook();
        XSSFSheet sheet = workbook.createSheet(sheetName);

        int rowNum = 0;
        for (ArrayList<Object> objectArrList : gdl) {
            Row row = sheet.createRow(rowNum++);
            int cellNum = 0;
            for (Object obj : objectArrList) {
                Cell cell = row.createCell(cellNum++);
                boolean is_double = false, is_integer = false;
                try {
                    cell.setCellValue(Double.parseDouble(obj.toString()));
                    is_double = true;
                } catch (Exception e) {
                }
                if (!is_double) {
                    try {
                        cell.setCellValue(Integer.parseInt(obj.toString()));
                        is_integer = true;
                    } catch (Exception e) {

                    }
                }
                if (!is_double && !is_integer) {
                    if (obj == null) {
                        cell.setCellValue(new String());
                    } else {
                        cell.setCellValue(obj.toString());
                    }
                }
            }
        }
        try {
            FileOutputStream file = new FileOutputStream(outputFileName);
            workbook.write(file);
            file.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
}

This line:这一行:

 String csvStr = new String(Files.readAllBytes(Paths.get(inputFilePath)), StandardCharsets.UTF_8);

Issue:问题:

You are loading the whole file into the memory by using Files.readAllBytes .您正在使用Files.readAllBytes将整个文件加载到 memory 中。 And the allocated memory for the jvm processor on which this program is running is not enough.并且为运行该程序的 jvm 处理器分配的 memory 是不够的。

Possible Solution:可能的解决方案:

You may want to start reading the file using streams/buffers like BufferedReader.您可能希望使用 BufferedReader 等流/缓冲区开始读取文件。 Or you can lookup other Readers that allow you to read the file in bits so the whole memory is not consumed all at once.或者您可以查找其他允许您以位读取文件的阅读器,这样整个 memory 就不会一次全部消耗掉。

Further Modifications:进一步修改:

You will have to modify your program at the time of writing also where after you read bits of data, you process and and write to a file, and when the time comes to write to a file again, you append.您将不得不在写入时修改您的程序,在您读取数据位之后,您处理并写入文件,并且当再次写入文件时,您是 append。

As discussed in comments, the problem was due to incorrect IntelliJ run configuration.正如评论中所讨论的,问题是由于 IntelliJ 运行配置不正确造成的。

VM arguments need to be passed to a separate field in IntelliJ, not as "Program arguments". VM arguments 需要传递到 IntelliJ 中的单独字段,而不是作为“程序参数”。

Still, the program can be improved:不过,该程序可以改进:

Streaming version of XSSFWorkbook implementing the "BigGridDemo" strategy.实现“BigGridDemo”策略的 XSSFWorkbook 的流式版本。 This allows to write very large files without running out of memory as only a configurable portion of the rows are kept in memory at any one time.这允许写入非常大的文件而不会用完 memory,因为在任何时候只有行的可配置部分保留在 memory 中。

  • use "" instead of new String()使用 "" 而不是 new String()
  • not memory related: get generics right (you have strings in parsed CSV, not arbitrary objects)与 memory 无关:正确获取 generics(您在解析的 CSV 中有字符串,而不是任意对象)

Note that streaming both input and output is the best option.请注意,流式传输输入和 output 是最佳选择。 Having said that, 15MB input is tiny by todays standards, so I believe raising heap memory a bit is not a bad short-term solution话虽如此,按照今天的标准,15MB 输入是很小的,所以我相信稍微提高堆 memory 并不是一个糟糕的短期解决方案

I removed shared memory on java virtual machine: -Xms1024M -Xmx12288M我删除了 java 虚拟机上的共享 memory:-Xms1024M -Xmx12288M

Thanks to the @Faraz and @Lesiak perm solution write large xlsx file here:感谢@Faraz 和@Lesiak 烫发解决方案在这里写大 xlsx 文件:

Read CSV读取 CSV

public ArrayList<ArrayList<Object>> readCSV(String inputFilePath) {
        ArrayList<ArrayList<Object>> gal = new ArrayList<>();
        try {
            BufferedReader csvReader = new BufferedReader(new FileReader(inputFilePath));
            String row;
            int rowSize = 0;
            ArrayList<String> columnList = new ArrayList<>();
            while ((row = csvReader.readLine()) != null) {
                ArrayList<Object> rowCells = new ArrayList<>();
                if (rowSize == 0) {
                    if (row.charAt(row.length() - 1) == ',')
                        throw new Exception("CSV Format Error");
                    for (String columnName : row.split(",")) {
                        columnList.add(columnName);
                    }
                }
                int cellSize = 0;
                for (String cell : row.split(",")) {
                    if (cell.equals("")) {
                        rowCells.add(null);
                    } else {
                        rowCells.add(cell);
                    }
                    cellSize++;
                }
                if (cellSize != columnList.size()) {
                    for (int i = 0; i < columnList.size() - cellSize; i++) {
                        rowCells.add(null);
                    }
                }
                gal.add(rowCells);
                rowSize++;
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
        return gal;
}

Write XLSX写 XLSX

public void writeXLSX_horizontally(String outputFileName, ArrayList<ArrayList<Object>> gdl, String sheetName) {
        SXSSFWorkbook workbook = new SXSSFWorkbook();
        SXSSFSheet sheet = workbook.createSheet(sheetName);
        int rowNum = 0;
        for (ArrayList<Object> objectArrList : gdl) {
            Row row = sheet.createRow(rowNum++);
            int cellNum = 0;
            for (Object obj : objectArrList) {
                Cell cell = row.createCell(cellNum++);
                boolean is_double = false, is_integer = false;
                try {
                    cell.setCellValue(Double.parseDouble(obj.toString()));
                    is_double = true;
                } catch (Exception e) { }
                if (!is_double)
                    try {
                        cell.setCellValue(Integer.parseInt(obj.toString()));
                        is_integer = true;
                    } catch (Exception e) { }
                if (!is_double && !is_integer)
                    if (obj == null)
                        cell.setCellValue(new String());
                    else
                        cell.setCellValue(obj.toString());
            }
        }
        try {
            FileOutputStream file = new FileOutputStream(outputFileName);
            workbook.write(file);
            file.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
}

Read large xlsx file solution here: How to read XLSX file of size >40MB在此处阅读大型 xlsx 文件解决方案: 如何读取大小 >40MB 的 XLSX 文件

Other important library for read large xlsx file: https://github.com/monitorjbl/excel-streaming-reader用于读取大型 xlsx 文件的其他重要库: https://github.com/monitorjbl/excel-streaming-reader

constraints: xlsx file line count must be between 0..1048575约束:xlsx 文件行数必须在 0..1048575 之间

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM