简体   繁体   English

Java-使用大型Excel进行读取,处理和写入

[英]Java - read, process, write with large excel

I have a large spread sheet. 我的床单很大。 It has 10 sheets, each with 1m rows. 它有10张纸,每张纸有1m行。 With Java, I need to run an algorithm for each row, return a value for each row and insert back into the excel file. 使用Java,我需要为每一行运行一个算法,为每一行返回一个值,然后再插入到excel文件中。

My idea was to load the file into ram, do calculations for each row, store the result in a list, and insert back to excel in order, but I didn't anticipate the issues dealing with the data size. 我的想法是将文件加载到ram中,对每一行进行计算,将结果存储在列表中,然后按顺序插入到excel中,但是我没有想到处理数据大小的问题。

I tried XSSF, and it wasn't able to load such a large file. 我尝试了XSSF,但它无法加载这么大的文件。 After waiting for a few hours it gave me the OOM error. 等待了几个小时后,它给了我OOM错误。

I tried increasing heap in run->run configurations->arguments, and in control panel->java. 我尝试在运行->运行配置->参数以及控制面板-> Java中增加堆。 It didn't work. 没用

I tried using the following StreamingReader and it didn't work. 我尝试使用以下StreamingReader,但无法正常工作。

FileInputStream in = new FileInputStream("D:\\work\\calculatepi\\sampleresult.xlsx");
Workbook workbook = StreamingReader.builder()
    .rowCacheSize(100)  
    .bufferSize(4096)  
    .open(in);  

I'm really out of clue and not sure what to do. 我真的不知道该怎么办。 Is there no easy way to do this? 有没有简单的方法可以做到这一点?

It is not only about the configuration of that library. 它不仅与该库的配置有关。 It is also about the memory that you give to you JVM! 这也与您为JVM提供的内存有关! Try increasing the heap space of the JVM, see here for example. 尝试增加JVM的堆空间,例如,请参见此处

Beyond that: I think you should do two things: 除此之外:我认为您应该做两件事:

  • make experiments with smaller sheets. 较小的纸张进行实验。 Create one that only has 100 rows, then maybe 10K, 100K. 创建一个只有100行,然后可能是10K,100K的行。 Measure the memory consumption. 测量内存消耗。 And from there 从那里
  • see if there are other APIs/libraries that allow you to read/write individual rows without pulling the whole file into memory 查看是否还有其他API /库可让您读取/写入单个行, 而无需将整个文件拖入内存
  • and if none of that works, maybe you have to use a completely different design: such as just having some sort of "service". 如果这些都不起作用,那么也许您必须使用完全不同的设计:例如仅提供某种“服务”。 And now, you write some VB script code that you run inside excel, that simply for each row calls that service to fetch the results. 现在,您编写了一些在excel中运行的VB脚本代码,只需针对每一行调用该服务以获取结果。 Or, ideally: do not misuse Excel as database. 或者,理想情况下:不要将Excel误用作数据库。 This is similar to using a sports car to transport a huge number of goods, just because you already have that sports car. 这类似于使用跑车运输大量货物,只是因为您已经拥有该跑车。 But it would still be more appropriate to get yourself a truck instead. 但是,取而代之的是让自己更适合卡车。 In other words: consider moving your data into a real database. 换句话说:考虑将数据移动到真实数据库中。 In the long run, everything you do will be "easier" then! 从长远来看,您所做的一切都将变得“轻松”!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM