簡體   English   中英

將行添加到大型xlsx文件(內存不足)

[英]Adding a row to a large xlsx file (Out of Memory)

情況如下; 我有一個簡單的程序,該程序使用Apache Poi庫在現有xlsx文件的末尾添加一行數據。 見下文

File file = new File(input);
XSSFWorkbook workbook = new XSSFWorkbook(file);
XSSFSheet sheet = workbook.getSheetAt(0);
XSSFRow row = sheet.createRow(sheet.getLastRowNum() + 1);

之后,我將遍歷該行並設置CellValues。 但是問題是,在代碼的第二行,如上所示,出現內存不足錯誤。 有沒有一種方法可以在現有的xlsx文件中添加一行數據,而不必完全讀取該文件?

您可以嘗試XSSF和SAX(事件API)

如果得到XSSFWorkbook失敗,因為內存不足的錯誤的和需要的是讀取寫入的工作簿,然后既不SXSSF也不SAX解析器會有所幫助。 一個僅用於寫作。 另一個僅用於閱讀。

以下兩種方法都需要有關*.xlsx文件格式( Office Open XML)的知識 通常, *.xlsx文件是一個ZIP存檔,其中包含XML文件和其他文件的特殊目錄結構。 因此,可以使用ZIP軟件將*.xlsx文件解壓縮以查看XML文件。 文件格式最早由Ecma標准化。 因此,對於進一步的檢索,我更喜歡Ecma標記語言參考 例如Row

兩個示例中使用的ReadAndWriteTest.xlsx必須至少具有一個工作表,而第一個工作表必須至少具有一行。

一種方法可能是使用XMLBeansDOM方法。 我最喜歡的參考是grepcode

例:

import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.openxml4j.opc.PackagePart;

import org.apache.poi.xssf.model.SharedStringsTable;

import java.io.File;
import java.io.OutputStream;

import org.openxmlformats.schemas.spreadsheetml.x2006.main.WorksheetDocument;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTWorksheet;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTSheetData;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTRst;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTCell;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.STCellType;

import  org.openxmlformats.schemas.officeDocument.x2006.relationships.STRelationshipId;

import org.apache.xmlbeans.XmlOptions;

import javax.xml.namespace.QName;

import java.util.Map;
import java.util.HashMap;

import java.util.regex.Pattern;

class DOMReadAndWriteTest {

 public static void main(String[] args) {
  try {

   File file = new File("ReadAndWriteTest.xlsx");
   //we only open the OPCPackage, we don't create a Workbook
   OPCPackage opcpackage = OPCPackage.open(file);

   //if there are strings in the SheetData, we need the SharedStringsTable
   PackagePart sharedstringstablepart = opcpackage.getPartsByName(Pattern.compile("/xl/sharedStrings.xml")).get(0);
   SharedStringsTable sharedstringstable = new SharedStringsTable();
   sharedstringstable.readFrom(sharedstringstablepart.getInputStream());

   //get the PackagePart of the first sheet
   PackagePart sheetpart = opcpackage.getPartsByName(Pattern.compile("/xl/worksheets/sheet1.xml")).get(0);
   //get the worksheet from the first sheet's XML
   //if it even fails while parsing this, then this approach is not usable
   WorksheetDocument worksheetdocument = WorksheetDocument.Factory.parse(sheetpart.getInputStream());
   CTWorksheet worksheet = worksheetdocument.getWorksheet();

   CTSheetData sheetdata = worksheet.getSheetData();

   //put some data in 10 new rows"
   for (int i = 0; i < 10; i++) {
    int rowsCount = sheetdata.sizeOfRowArray();

    CTCell ctcell= sheetdata.addNewRow().addNewC();

    CTRst ctstr = CTRst.Factory.newInstance();
    ctstr.setT("new Row " + (rowsCount + 1));
    int sRef = sharedstringstable.addEntry(ctstr);
    ctcell.setT(STCellType.S);
    ctcell.setV(Integer.toString(sRef));

    ctcell=sheetdata.getRowArray(rowsCount).addNewC();
    ctcell.setV(""+rowsCount+"."+(i+1)+""+((i+2>9)?0:i+2));
   }

   //write the SharedStringsTable
   OutputStream out = sharedstringstablepart.getOutputStream();
   sharedstringstable.writeTo(out);
   out.close();

   //create XmlOptions for saving the worksheet
   XmlOptions xmlOptions = new XmlOptions();
   xmlOptions.setSaveOuter();
   xmlOptions.setUseDefaultNamespace();
   xmlOptions.setSaveAggressiveNamespaces();
   xmlOptions.setCharacterEncoding("UTF-8");
   xmlOptions.setSaveSyntheticDocumentElement(new QName(CTWorksheet.type.getName().getNamespaceURI(), "worksheet"));
   Map<String, String> map = new HashMap<String, String>();
   map.put(STRelationshipId.type.getName().getNamespaceURI(), "r");
   xmlOptions.setSaveSuggestedPrefixes(map);

   //save the worksheet
   out = sheetpart.getOutputStream();
   worksheet.save(out, xmlOptions);
   out.close();

   opcpackage.close();

  } catch (Exception ex) {
     ex.printStackTrace();
  }
 }
}

此代碼在未打開整個工作簿的情況下在ReadAndWriteTest.xlsx sheet1中寫入了10個新行。 但是它至少必須打開並解析sheet1和SharedStringsTable 即使失敗,也無法使用此方法。

另一種方法是使用StAX 該API可以讀寫XML事件驅動。 它使用流式傳輸。

例:

import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.openxml4j.opc.PackagePart;

import org.apache.poi.xssf.model.SharedStringsTable;

import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTRst;

import javax.xml.stream.XMLEventFactory;
import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLEventWriter;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.events.Characters;
import javax.xml.stream.events.StartElement;
import javax.xml.stream.events.EndElement;
import javax.xml.stream.events.Attribute;
import javax.xml.stream.events.XMLEvent;

import javax.xml.namespace.QName;

import java.io.File;
import java.io.InputStream;
import java.io.OutputStream;

import java.util.Arrays;
import java.util.List;
import java.util.regex.Pattern;

class StaxReadAndWriteTest {

 public static void main(String[] args) {
  try {

   File file = new File("ReadAndWriteTest.xlsx");
   OPCPackage opcpackage = OPCPackage.open(file);

   //if there are strings in the sheet data, we need the SharedStringsTable
   //if it even fails while parsing this SharedStringsTable, then this approach is not usable
   //then we must stream this XML event driven also.
   PackagePart sharedstringstablepart = opcpackage.getPartsByName(Pattern.compile("/xl/sharedStrings.xml")).get(0);
   SharedStringsTable sharedstringstable = new SharedStringsTable();
   sharedstringstable.readFrom(sharedstringstablepart.getInputStream());

   PackagePart sheetpart = opcpackage.getPartsByName(Pattern.compile("/xl/worksheets/sheet1.xml")).get(0);

   XMLEventReader reader = XMLInputFactory.newInstance().createXMLEventReader(sheetpart.getInputStream());
   XMLEventWriter writer = XMLOutputFactory.newInstance().createXMLEventWriter(sheetpart.getOutputStream());

   XMLEventFactory eventFactory = XMLEventFactory.newInstance();

   int rowsCount = 0;

   while(reader.hasNext()){ //loop over all XML in sheet1.xml
    XMLEvent event = (XMLEvent)reader.next();
    writer.add(event); //by default write each readed event

    if(event.isStartElement()){
     StartElement startElement = (StartElement)event;
     QName startElementName = startElement.getName();
     if(startElementName.getLocalPart().equalsIgnoreCase("row")) { //start element of row
      boolean rowStart = true;
      rowsCount++;
      do {
       event = (XMLEvent)reader.next(); //find this row's end
       writer.add(event); //by default write each readed event

       if(event.isEndElement()){
        EndElement endElement = (EndElement)event;
        QName endElementName = endElement.getName();
        if(endElementName.getLocalPart().equalsIgnoreCase("row")) { //end element of row
         rowStart = false;
         //we assume that there is nothing else (character data) between end element of row and next element 
         XMLEvent nextElement = (XMLEvent)reader.peek();
         QName nextElementName = null;
         if (nextElement.isStartElement()) nextElementName = ((StartElement)nextElement).getName();
         else if (nextElement.isEndElement()) nextElementName = ((EndElement)nextElement).getName();
         if(!nextElementName.getLocalPart().equalsIgnoreCase("row")) { //next is not start element of row
          //we have the last row, so we write new rows now 

          for (int i = 0; i < 10; i++) {

           StartElement newRowStart = eventFactory.createStartElement(new QName("row"), null, null);
           writer.add(newRowStart);

//start cell A
           Attribute attribute = eventFactory.createAttribute("t", "s");
           List attributeList = Arrays.asList(attribute);
           StartElement newCellStart = eventFactory.createStartElement(new QName("c"), attributeList.iterator(), null);
           writer.add(newCellStart);

           CTRst ctstr = CTRst.Factory.newInstance();
           ctstr.setT("new Row " + (rowsCount +1));
           int sRef = sharedstringstable.addEntry(ctstr);

           StartElement newCellValue = eventFactory.createStartElement(new QName("v"), null, null);
           writer.add(newCellValue);

           Characters value = eventFactory.createCharacters(Integer.toString(sRef));
           writer.add(value);         

           EndElement newCellValueEnd = eventFactory.createEndElement(new QName("v"), null);
           writer.add(newCellValueEnd);

           EndElement newCellEnd = eventFactory.createEndElement(new QName("c"), null);
           writer.add(newCellEnd);
//end cell A
//start cell B
           newCellStart = eventFactory.createStartElement(new QName("c"), null, null);
           writer.add(newCellStart);

           newCellValue = eventFactory.createStartElement(new QName("v"), null, null);
           writer.add(newCellValue);

           value = eventFactory.createCharacters(""+rowsCount+"."+(i+1)+""+((i+2>9)?0:i+2));
           writer.add(value);         

           newCellValueEnd = eventFactory.createEndElement(new QName("v"), null);
           writer.add(newCellValueEnd);

           newCellEnd = eventFactory.createEndElement(new QName("c"), null);
           writer.add(newCellEnd);
//end cell B

           EndElement newRowEnd = eventFactory.createEndElement(new QName("row"), null);
           writer.add(newRowEnd);

           rowsCount++;
          }
         }
        }
       }
      } while (rowStart);
     }
    }
   }

   writer.flush();

   //write the SharedStringsTable
   OutputStream out = sharedstringstablepart.getOutputStream();
   sharedstringstable.writeTo(out);
   out.close();

   opcpackage.close();

  } catch (Exception ex) {
     ex.printStackTrace();
  }
 }
}

該代碼還可以在不打開整個工作簿的情況下在ReadAndWriteTest.xlsx sheet1中寫入10個新行。 但是它至少必須打開並解析SharedStringsTable 即使失敗,也無法使用該方法。 但是當然,甚至可以使用StAX流式傳輸SharedStringsTable 但是正如您在生成行和單元格的示例中所看到的,這要復雜得多。 因此,在此示例中,使用SharedStringsTable使事情變得更容易。

(沒有足夠的聲譽來將此添加為評論)您是否嘗試過使用SXSSFWorkbook而不是XSSFWorkbook?

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM