简体   繁体   中英

how to read a large data of excel file (xlsx) using java

This coding is able to read the small data of excel file... but not reading the large data files in excel files.... how to modify the code further?

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.sql.SQLException;
import java.util.Iterator;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

/**
 *
 * @author Administrator
 */
public class ReadExcelNdArray {


    public static void main(String[] args) throws Exception {
        long start = System.currentTimeMillis();

        System.out.println("Time taken: " + (System.currentTimeMillis() - start) + " ms");

       File myFile = new File("D://Raghulpr/Transaction Data.xlsx");
            FileInputStream fis = new FileInputStream(myFile);

            // Finds the workbook instance for XLSX file
            XSSFWorkbook myWorkBook = new XSSFWorkbook (fis);

            // Return first sheet from the XLSX workbook
            XSSFSheet mySheet = myWorkBook.getSheetAt(0);

            // Get iterator to all the rows in current sheet
            Iterator<Row> rowIterator = mySheet.iterator();

            // Traversing over each row of XLSX file
            while (rowIterator.hasNext()) {
                Row row = rowIterator.next();

                // For each row, iterate through each columns
                Iterator<Cell> cellIterator = row.cellIterator();
                while (cellIterator.hasNext()) {

                    Cell cell = cellIterator.next();

                    switch (cell.getCellType()) {
                    case Cell.CELL_TYPE_STRING:
                        System.out.print(cell.getStringCellValue() + "\t");
                        break;
                    case Cell.CELL_TYPE_NUMERIC:
                        System.out.print(cell.getNumericCellValue() + "\t");
                        break;
                    case Cell.CELL_TYPE_BOOLEAN:
                        System.out.print(cell.getBooleanCellValue() + "\t");
                        break;
                    default :

                    }
                }
                System.out.println("");
            }
    }      
}

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

at java.io.ByteArrayOutputStream.<init>(ByteArrayOutputStream.java:77)
at org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource$FakeZipEntry.<init>(ZipInputStreamZipEntrySource.java:121)
at org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.<init>(ZipInputStreamZipEntrySource.java:55)
at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:88)
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:272)
at org.apache.poi.util.PackageHelper.open(PackageHelper.java:37)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:254)
at readexcelndarray.ReadExcelNdArray.main(ReadExcelNdArray.java:36)

Firstly you need to close all Input - output stream object like FileInputStream etc in your code. Secondly, you can also increase your JVM heap space as mention in this link: Increase heap size in Java

I don't know if you still need answer to this, but I was also searching for the same and was struggling to read a large file . After spending a lot of time all over the internet I found one solution to this . You can check Excel streaming reader

import com.monitorjbl.xlsx.StreamingReader;
InputStream is = new FileInputStream(new File("G:\\Book1.xlsx"));
    Workbook  workbook = StreamingReader.builder()
            .rowCacheSize(100)    
            .bufferSize(4096)     
            .open(is);            

Now you can use workbook to process your file further .

I was able to process xlsx file having more than 4 lac records .

您需要增加堆大小才能读取大文件。我建议使用64位计算机。

We have jxl api for reading, writing excel files. The problem with this api is at the max you can read and write 65535 rows while starting row is indexed at 0. But it's really flexible.

Since, number of rows are more than 65535 in your case, I would suggest you to prefer Apache POI. Virtually, there is no limit for this api.

I've had the same problem, if you change to the much lower level SAX parsing instead you will save a lot of memory. http://poi.apache.org/spreadsheet/how-to.html#xssf_sax_api

I think I reduced about 4.5 GB(!) memory usage (about 11MB file with a lot of formulas) down to something more manageable (don't remember exactly, but it was so low it didn't matter anymore, at least reduced by a factor of 10).

Harder to implement but worth the time if you need to reduce memory footprint

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM