简体   繁体   English

使用 Apache POI 获取大型 Excel 文件的 Excel 工作表名称

[英]Get excel sheet names for large excel files using Apache POI

I have the following code that I use to get the sheet names of an excel file(.xlsx)我有以下代码用于获取 Excel 文件(.xlsx)的工作表名称

    XSSFWorkbook workBookXlsx = new XSSFWorkbook(new FileInputStream(pathToFile));
    ArrayList<String> sheetNames = new ArrayList<>();

    int numberOfSheets = workBookXlsx.getNumberOfSheets();
    for (int i = 0; i < numberOfSheets; i++) {
        sheetNames.add(workBookXlsx.getSheetAt(i).getSheetName());
    }

    workBookXlsx = null;

The issue I have with the above code is that it takes a lot of memory(~700MB) & a long time(5-6s) to create the XSSFWorkbook for a file of size 9MB.我对上述代码的问题是,为大小为 9MB 的文件创建XSSFWorkbook需要大量内存(~700MB)和很长时间(5-6s)。 Even setting the workBookXlsx to null doesn't release the memory taken by the javaw (I know gc may or maynot be called & JVM wont release memory just because I have set a variable to null)即使将workBookXlsx设置为null也不会释放javaw占用的内存(我知道gc可能会或可能不会被调用,JVM 不会仅仅因为我将变量设置为 null 就释放内存)

I did go through the documentation of Workbook , XSSFWorkbook & from what I understood, there is no method that will help me get the sheet names with low memory imprint.根据我的理解,我确实阅读了WorkbookXSSFWorkbook的文档,没有任何方法可以帮助我获得低内存印记的工作表名称。

The one solution I have found is to manually unzip the .xlsx file and read the contents of the .\\xl\\woorkbook.xml to get the sheet names and the r:id我发现的一种解决方案是手动解压缩.xlsx文件并读取.\\xl\\woorkbook.xml以获取工作表名称和r:id

Is there an API for getting the sheet names in an .xlsx file without large memory imprint?是否有用于在没有大内存印记的.xlsx文件中获取工作表名称的 API?

To show what @Gagravarr probably meant with his comment:为了说明@Gagravarr 的评论可能意味着什么:

The XSSFReader contains a method XSSFReader.getSheetsData which "Returns an Iterator which will let you get at all the different Sheets in turn. Each sheet's InputStream is only opened when fetched from the Iterator. It's up to you to close the InputStreams when done with each one.". XSSFReader包含一个方法XSSFReader.getSheetsData ,该方法“返回一个迭代器,它可以让您依次获取所有不同的工作表。每个工作表的 InputStream 仅在从迭代器中获取时才打开。完成每个工作表后,由您来关闭 InputStreams一。”。 But as often this is not the whole truth.但这通常不是全部真相。 In truth it returns a XSSFReader.SheetIterator which has a method XSSFReader.SheetIterator.getSheetName to get the sheet names.事实上,它返回一个XSSFReader.SheetIterator ,它有一个方法XSSFReader.SheetIterator.getSheetName来获取工作表名称。

Example:例子:

import java.io.InputStream;
import java.io.FileInputStream;

import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xssf.eventusermodel.XSSFReader;

import java.util.Iterator;

public class ExcelXSSFGetSheetNamesXSSFReader {

 public static void main(String[] args) throws Exception {

  OPCPackage pkg = OPCPackage.open(new FileInputStream("Example.xlsx"));
  XSSFReader r = new XSSFReader( pkg );
  Iterator<InputStream> sheets = r.getSheetsData();

  if (sheets instanceof XSSFReader.SheetIterator) {
   XSSFReader.SheetIterator sheetiterator = (XSSFReader.SheetIterator)sheets;

   while (sheetiterator.hasNext()) {
    InputStream dummy = sheetiterator.next();

    System.out.println(sheetiterator.getSheetName());

    dummy.close();
   }
  }

  pkg.close();
 }
}

Conclusion: Currently you cannot work with apache poi only by trusting the API documentation.结论:目前您不能仅通过信任 API 文档来使用apache poi Instead you must always have a look at the source code .相反,您必须始终查看源代码

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM