简体   繁体   中英

How to get merged regions using apache POI's event API?

How can I get merged regions (merged cells) of an excel sheet using the event API provided by Apache POI?

Using the "traditional" DOM-like parsing style there are methods called Sheet.getNumMergedRegions() and Sheet.getMergedRegion(int) . Unfortunately I need to handle huge Excel files where I get out of memory errors even with the highest Xmx-value I am allowed to use (in this project). So I'd like to use the event API, but wasn't able to find out how to get information about merged regions, which I need to know to "understand" the content correctly...

Using the example given here: http://poi.apache.org/spreadsheet/how-to.html#xssf_sax_api I get events for each single cell of a merged region (only the first of them contains any textual content though). So maybe, if there isn't a more direct way, it would help to know how those merged cells could be (safely) distinguished from other (empty) cells...

I don't know for sure where merged cell info gets stored, but I'm fairly sure it won't be with the cell data itself, as that's not the Excel way.

What I'd suggest you do is create a simple file without merged cells. Then, take a copy, and add a single merged cell. Unzip both of these (.xlsx is a zip of xml files), and diff them. That'll show you quite quickly what gets set to mark cells as merged. (My hunch is that it'll be somewhere in the sheet settings, near the start but not near the cell values, BICBW)

Once you know where the merged cell details live, you can take a look at the XSSF UserModel code for working with merged cells to get an idea of how they work, how they're manipulated, what the options are etc. With that in mind, you can look at the file format docs for the full details, but those can be a bit heavy and detailed to go to first. Finally, you can add in your code to use the merged info details, once you know where to get it from!

You need to open stream and parse it twice.

First time - to extract merged cells. They are appears in the sheet...xml file after <sheetData>...</sheetData> tag, like in this example:

...
< /sheetData >
< mergeCells count="2" >
    < mergeCell ref="A2:C2"/ >
    < mergeCell ref="A3:A7"/ >
 </mergeCells >

Extract that and keep in some List.

Then reopen the stream again and parse it as usual, to extract rows and cells. In the endElement(...) method when finishing every row, check if this row appears (partially or completely) in the merged region.

To expand on Mike's answer. You can create a ContentHandler to locate Merge Regions like:

import java.util.ArrayList;
import java.util.List;

import org.apache.poi.ss.util.CellRangeAddress;

import org.xml.sax.Attributes;
import org.xml.sax.helpers.DefaultHandler;

public class MergedRegionLocator extends DefaultHandler {
    private final List<CellRangeAddress> mergedRegions = new ArrayList<>();

    @Override
    public void startElement (String uri, String localName, String name, Attributes attributes) {
        if ("mergeCell".equals(name) && attributes.getValue("ref") != null) {
            mergedRegions.add(CellRangeAddress.valueOf(attributes.getValue("ref")));
        }
    }

    public CellRangeAddress getMergedRegion (int index) {
        return mergedRegions.get(index);
    }

    public List<CellRangeAddress> getMergedRegions () {
        return mergedRegions;
    }
}

An example of using it with POIs Event-Based parsing:

OPCPackage pkg = OPCPackage.open(new FileInputStream("test.xlsx"));
XSSFReader reader = new XSSFReader(pkg);
InputStream sheetData = reader.getSheetsData().next();

MergedRegionLocator mergedRegionLocator = new MergedRegionLocator();
XMLReader parser = XMLReaderFactory.createXMLReader();
parser.setContentHandler(mergedRegionLocator);
parser.parse(new InputSource(sheetData));

mergedRegionLocator.getMergedRegions();

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM