如何使用Apache POI的事件API获取合并区域？

Question

How can I get merged regions (merged cells) of an excel sheet using the event API provided by Apache POI? 如何使用Apache POI提供的事件API获取Excel工作表的合并区域（合并单元格）？

Using the "traditional" DOM-like parsing style there are methods called Sheet.getNumMergedRegions() and Sheet.getMergedRegion(int) . 使用“传统的”类似于DOM的解析样式，有称为Sheet.getNumMergedRegions()和Sheet.getMergedRegion(int) 。 Unfortunately I need to handle huge Excel files where I get out of memory errors even with the highest Xmx-value I am allowed to use (in this project). 不幸的是，我需要处理巨大的Excel文件，即使我拥有允许使用的最高Xmx值（在此项目中），也会出现内存不足错误。 So I'd like to use the event API, but wasn't able to find out how to get information about merged regions, which I need to know to "understand" the content correctly... 因此，我想使用事件API，但无法找出如何获取有关合并区域的信息，我需要知道这些信息才能正确地“理解”内容...

Using the example given here: http://poi.apache.org/spreadsheet/how-to.html#xssf_sax_api I get events for each single cell of a merged region (only the first of them contains any textual content though). 使用此处提供的示例： http : //poi.apache.org/spreadsheet/how-to.html#xssf_sax_api，我得到了合并区域中每个单元格的事件（尽管只有第一个包含任何文本内容）。 So maybe, if there isn't a more direct way, it would help to know how those merged cells could be (safely) distinguished from other (empty) cells... 因此，也许，如果没有更直接的方法，这将有助于了解如何将这些合并的单元格与其他（空）单元格（安全）区分开来...

Answer 1

I don't know for sure where merged cell info gets stored, but I'm fairly sure it won't be with the cell data itself, as that's not the Excel way. 我不确定在哪里存储合并的单元格信息，但是我很确定它不会与单元格数据本身一起存储，因为这不是Excel的方式。

What I'd suggest you do is create a simple file without merged cells. 我建议您做的是创建一个没有合并单元格的简单文件。 Then, take a copy, and add a single merged cell. 然后，进行复制，并添加一个合并的单元格。 Unzip both of these (.xlsx is a zip of xml files), and diff them. 解压缩这两个文件（.xlsx是xml文件的zip），然后将它们进行比较。 That'll show you quite quickly what gets set to mark cells as merged. 这将很快向您显示将标记为合并的单元格设置的内容。 (My hunch is that it'll be somewhere in the sheet settings, near the start but not near the cell values, BICBW) （我的直觉是它将在工作表设置中的某个位置，靠近开始但不靠近单元格值BICBW）

Once you know where the merged cell details live, you can take a look at the XSSF UserModel code for working with merged cells to get an idea of how they work, how they're manipulated, what the options are etc. With that in mind, you can look at the file format docs for the full details, but those can be a bit heavy and detailed to go to first. 一旦知道了合并单元的详细信息在哪里，就可以查看用于合并单元的XSSF UserModel代码，以了解它们的工作方式，操作方式，选项等。，您可以查看文件格式文档以获取完整的详细信息，但是这些内容可能有点繁琐且详细，请先阅读。 Finally, you can add in your code to use the merged info details, once you know where to get it from! 最后，一旦知道从何处获取信息，就可以添加代码以使用合并的信息详细信息！

Answer 2

You need to open stream and parse it twice. 您需要打开流并将其解析两次。

First time - to extract merged cells. 第一次-提取合并的单元格。 They are appears in the sheet...xml file after <sheetData>...</sheetData> tag, like in this example: 它们出现在<sheetData>...</sheetData>标记之后的sheet...xml文件中，如以下示例所示：

...
< /sheetData >
< mergeCells count="2" >
    < mergeCell ref="A2:C2"/ >
    < mergeCell ref="A3:A7"/ >
 </mergeCells >

Extract that and keep in some List. 提取并保存在列表中。

Then reopen the stream again and parse it as usual, to extract rows and cells. 然后再次重新打开流并照常进行解析，以提取行和单元格。 In the endElement(...) method when finishing every row, check if this row appears (partially or completely) in the merged region. 在结束每一行时，在endElement(...)方法中，检查此行是否（部分或全部）出现在合并区域中。

Answer 3

To expand on Mike's answer. 扩展麦克的答案。 You can create a ContentHandler to locate Merge Regions like: 您可以创建一个ContentHandler来定位合并区域，例如：

import java.util.ArrayList;
import java.util.List;

import org.apache.poi.ss.util.CellRangeAddress;

import org.xml.sax.Attributes;
import org.xml.sax.helpers.DefaultHandler;

public class MergedRegionLocator extends DefaultHandler {
    private final List<CellRangeAddress> mergedRegions = new ArrayList<>();

    @Override
    public void startElement (String uri, String localName, String name, Attributes attributes) {
        if ("mergeCell".equals(name) && attributes.getValue("ref") != null) {
            mergedRegions.add(CellRangeAddress.valueOf(attributes.getValue("ref")));
        }
    }

    public CellRangeAddress getMergedRegion (int index) {
        return mergedRegions.get(index);
    }

    public List<CellRangeAddress> getMergedRegions () {
        return mergedRegions;
    }
}

An example of using it with POIs Event-Based parsing: 将其与基于事件的POI一起使用的示例：

OPCPackage pkg = OPCPackage.open(new FileInputStream("test.xlsx"));
XSSFReader reader = new XSSFReader(pkg);
InputStream sheetData = reader.getSheetsData().next();

MergedRegionLocator mergedRegionLocator = new MergedRegionLocator();
XMLReader parser = XMLReaderFactory.createXMLReader();
parser.setContentHandler(mergedRegionLocator);
parser.parse(new InputSource(sheetData));

mergedRegionLocator.getMergedRegions();

如何使用Apache POI的事件API获取合并区域？

问题描述

3 个解决方案

解决方案1
2 2012-07-23 11:59:52

解决方案2
1 2015-04-08 13:28:45

解决方案3
1 2017-03-20 20:30:44

如何使用Apache POI的事件API获取合并区域？

问题描述

3 个解决方案

解决方案1 2 2012-07-23 11:59:52

解决方案2 1 2015-04-08 13:28:45

解决方案3 1 2017-03-20 20:30:44

解决方案1
2 2012-07-23 11:59:52

解决方案2
1 2015-04-08 13:28:45

解决方案3
1 2017-03-20 20:30:44