[英]Memory issues during conversion of large volume of XLSX file to CSV with POI
[英]Adding a row to a large xlsx file (Out of Memory)
情況如下; 我有一個簡單的程序,該程序使用Apache Poi庫在現有xlsx文件的末尾添加一行數據。 見下文
File file = new File(input);
XSSFWorkbook workbook = new XSSFWorkbook(file);
XSSFSheet sheet = workbook.getSheetAt(0);
XSSFRow row = sheet.createRow(sheet.getLastRowNum() + 1);
之后,我將遍歷該行並設置CellValues。 但是問題是,在代碼的第二行,如上所示,出現內存不足錯誤。 有沒有一種方法可以在現有的xlsx文件中添加一行數據,而不必完全讀取該文件?
您可以嘗試XSSF和SAX(事件API) 。
如果得到XSSFWorkbook
失敗,因為內存不足的錯誤的和需要的是讀取和寫入的工作簿,然后既不SXSSF
也不SAX
解析器會有所幫助。 一個僅用於寫作。 另一個僅用於閱讀。
以下兩種方法都需要有關*.xlsx
文件格式( Office Open XML)的知識 。 通常, *.xlsx
文件是一個ZIP
存檔,其中包含XML
文件和其他文件的特殊目錄結構。 因此,可以使用ZIP
軟件將*.xlsx
文件解壓縮以查看XML
文件。 文件格式最早由Ecma標准化。 因此,對於進一步的檢索,我更喜歡Ecma標記語言參考 。 例如Row 。
兩個示例中使用的ReadAndWriteTest.xlsx
必須至少具有一個工作表,而第一個工作表必須至少具有一行。
一種方法可能是使用XMLBeans的DOM
方法。 我最喜歡的參考是grepcode 。
例:
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.openxml4j.opc.PackagePart;
import org.apache.poi.xssf.model.SharedStringsTable;
import java.io.File;
import java.io.OutputStream;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.WorksheetDocument;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTWorksheet;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTSheetData;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTRst;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTCell;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.STCellType;
import org.openxmlformats.schemas.officeDocument.x2006.relationships.STRelationshipId;
import org.apache.xmlbeans.XmlOptions;
import javax.xml.namespace.QName;
import java.util.Map;
import java.util.HashMap;
import java.util.regex.Pattern;
class DOMReadAndWriteTest {
public static void main(String[] args) {
try {
File file = new File("ReadAndWriteTest.xlsx");
//we only open the OPCPackage, we don't create a Workbook
OPCPackage opcpackage = OPCPackage.open(file);
//if there are strings in the SheetData, we need the SharedStringsTable
PackagePart sharedstringstablepart = opcpackage.getPartsByName(Pattern.compile("/xl/sharedStrings.xml")).get(0);
SharedStringsTable sharedstringstable = new SharedStringsTable();
sharedstringstable.readFrom(sharedstringstablepart.getInputStream());
//get the PackagePart of the first sheet
PackagePart sheetpart = opcpackage.getPartsByName(Pattern.compile("/xl/worksheets/sheet1.xml")).get(0);
//get the worksheet from the first sheet's XML
//if it even fails while parsing this, then this approach is not usable
WorksheetDocument worksheetdocument = WorksheetDocument.Factory.parse(sheetpart.getInputStream());
CTWorksheet worksheet = worksheetdocument.getWorksheet();
CTSheetData sheetdata = worksheet.getSheetData();
//put some data in 10 new rows"
for (int i = 0; i < 10; i++) {
int rowsCount = sheetdata.sizeOfRowArray();
CTCell ctcell= sheetdata.addNewRow().addNewC();
CTRst ctstr = CTRst.Factory.newInstance();
ctstr.setT("new Row " + (rowsCount + 1));
int sRef = sharedstringstable.addEntry(ctstr);
ctcell.setT(STCellType.S);
ctcell.setV(Integer.toString(sRef));
ctcell=sheetdata.getRowArray(rowsCount).addNewC();
ctcell.setV(""+rowsCount+"."+(i+1)+""+((i+2>9)?0:i+2));
}
//write the SharedStringsTable
OutputStream out = sharedstringstablepart.getOutputStream();
sharedstringstable.writeTo(out);
out.close();
//create XmlOptions for saving the worksheet
XmlOptions xmlOptions = new XmlOptions();
xmlOptions.setSaveOuter();
xmlOptions.setUseDefaultNamespace();
xmlOptions.setSaveAggressiveNamespaces();
xmlOptions.setCharacterEncoding("UTF-8");
xmlOptions.setSaveSyntheticDocumentElement(new QName(CTWorksheet.type.getName().getNamespaceURI(), "worksheet"));
Map<String, String> map = new HashMap<String, String>();
map.put(STRelationshipId.type.getName().getNamespaceURI(), "r");
xmlOptions.setSaveSuggestedPrefixes(map);
//save the worksheet
out = sheetpart.getOutputStream();
worksheet.save(out, xmlOptions);
out.close();
opcpackage.close();
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
此代碼在未打開整個工作簿的情況下在ReadAndWriteTest.xlsx
sheet1中寫入了10個新行。 但是它至少必須打開並解析sheet1和SharedStringsTable
。 即使失敗,也無法使用此方法。
另一種方法是使用StAX 。 該API可以讀寫XML事件驅動。 它使用流式傳輸。
例:
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.openxml4j.opc.PackagePart;
import org.apache.poi.xssf.model.SharedStringsTable;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTRst;
import javax.xml.stream.XMLEventFactory;
import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLEventWriter;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.events.Characters;
import javax.xml.stream.events.StartElement;
import javax.xml.stream.events.EndElement;
import javax.xml.stream.events.Attribute;
import javax.xml.stream.events.XMLEvent;
import javax.xml.namespace.QName;
import java.io.File;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.Arrays;
import java.util.List;
import java.util.regex.Pattern;
class StaxReadAndWriteTest {
public static void main(String[] args) {
try {
File file = new File("ReadAndWriteTest.xlsx");
OPCPackage opcpackage = OPCPackage.open(file);
//if there are strings in the sheet data, we need the SharedStringsTable
//if it even fails while parsing this SharedStringsTable, then this approach is not usable
//then we must stream this XML event driven also.
PackagePart sharedstringstablepart = opcpackage.getPartsByName(Pattern.compile("/xl/sharedStrings.xml")).get(0);
SharedStringsTable sharedstringstable = new SharedStringsTable();
sharedstringstable.readFrom(sharedstringstablepart.getInputStream());
PackagePart sheetpart = opcpackage.getPartsByName(Pattern.compile("/xl/worksheets/sheet1.xml")).get(0);
XMLEventReader reader = XMLInputFactory.newInstance().createXMLEventReader(sheetpart.getInputStream());
XMLEventWriter writer = XMLOutputFactory.newInstance().createXMLEventWriter(sheetpart.getOutputStream());
XMLEventFactory eventFactory = XMLEventFactory.newInstance();
int rowsCount = 0;
while(reader.hasNext()){ //loop over all XML in sheet1.xml
XMLEvent event = (XMLEvent)reader.next();
writer.add(event); //by default write each readed event
if(event.isStartElement()){
StartElement startElement = (StartElement)event;
QName startElementName = startElement.getName();
if(startElementName.getLocalPart().equalsIgnoreCase("row")) { //start element of row
boolean rowStart = true;
rowsCount++;
do {
event = (XMLEvent)reader.next(); //find this row's end
writer.add(event); //by default write each readed event
if(event.isEndElement()){
EndElement endElement = (EndElement)event;
QName endElementName = endElement.getName();
if(endElementName.getLocalPart().equalsIgnoreCase("row")) { //end element of row
rowStart = false;
//we assume that there is nothing else (character data) between end element of row and next element
XMLEvent nextElement = (XMLEvent)reader.peek();
QName nextElementName = null;
if (nextElement.isStartElement()) nextElementName = ((StartElement)nextElement).getName();
else if (nextElement.isEndElement()) nextElementName = ((EndElement)nextElement).getName();
if(!nextElementName.getLocalPart().equalsIgnoreCase("row")) { //next is not start element of row
//we have the last row, so we write new rows now
for (int i = 0; i < 10; i++) {
StartElement newRowStart = eventFactory.createStartElement(new QName("row"), null, null);
writer.add(newRowStart);
//start cell A
Attribute attribute = eventFactory.createAttribute("t", "s");
List attributeList = Arrays.asList(attribute);
StartElement newCellStart = eventFactory.createStartElement(new QName("c"), attributeList.iterator(), null);
writer.add(newCellStart);
CTRst ctstr = CTRst.Factory.newInstance();
ctstr.setT("new Row " + (rowsCount +1));
int sRef = sharedstringstable.addEntry(ctstr);
StartElement newCellValue = eventFactory.createStartElement(new QName("v"), null, null);
writer.add(newCellValue);
Characters value = eventFactory.createCharacters(Integer.toString(sRef));
writer.add(value);
EndElement newCellValueEnd = eventFactory.createEndElement(new QName("v"), null);
writer.add(newCellValueEnd);
EndElement newCellEnd = eventFactory.createEndElement(new QName("c"), null);
writer.add(newCellEnd);
//end cell A
//start cell B
newCellStart = eventFactory.createStartElement(new QName("c"), null, null);
writer.add(newCellStart);
newCellValue = eventFactory.createStartElement(new QName("v"), null, null);
writer.add(newCellValue);
value = eventFactory.createCharacters(""+rowsCount+"."+(i+1)+""+((i+2>9)?0:i+2));
writer.add(value);
newCellValueEnd = eventFactory.createEndElement(new QName("v"), null);
writer.add(newCellValueEnd);
newCellEnd = eventFactory.createEndElement(new QName("c"), null);
writer.add(newCellEnd);
//end cell B
EndElement newRowEnd = eventFactory.createEndElement(new QName("row"), null);
writer.add(newRowEnd);
rowsCount++;
}
}
}
}
} while (rowStart);
}
}
}
writer.flush();
//write the SharedStringsTable
OutputStream out = sharedstringstablepart.getOutputStream();
sharedstringstable.writeTo(out);
out.close();
opcpackage.close();
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
該代碼還可以在不打開整個工作簿的情況下在ReadAndWriteTest.xlsx
sheet1中寫入10個新行。 但是它至少必須打開並解析SharedStringsTable
。 即使失敗,也無法使用該方法。 但是當然,甚至可以使用StAX流式傳輸SharedStringsTable
。 但是正如您在生成行和單元格的示例中所看到的,這要復雜得多。 因此,在此示例中,使用SharedStringsTable
使事情變得更容易。
(沒有足夠的聲譽來將此添加為評論)您是否嘗試過使用SXSSFWorkbook而不是XSSFWorkbook?
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.