简体   繁体   中英

Read Excel and get the row number and file extension for embedded objects

I am using Apache POI with Java 1.8 in my application. In my application, I try to read Excel and get the embedded objects.

I need to know how to get the row number and file extensions for each embedded OLE object.

Workbook workbook = WorkbookFactory.create(new File(path));

XSSFWorkbook fWorkbook = (XSSFWorkbook) workbook;

List<PackagePart> embeddedDocs = fWorkbook.getAllEmbedds();

To get embeddedDocs.getContentType Which returns the application/vnd.openxmlformats-officedocument.oleObject .

But is there anyway where we can get the file extensions (ie pdf,ppt,mp3) as which is returned by the MimeType. And which way to get row number of embedded objects. Any ideas / Coding logic to resolve this will be very useful.

The following is working for - I guess - the usual suspects. I've tested it with .xls/x on the POI trunk, which will be POI 4.1.0, but should work with POI 4.0.1 too.

Known issues are:

  • an object wasn't embedded based on a file, then you don't get a filename. This probably also applies to most .xls files.

  • a .xlsx only contains a vmlDrawing*.xml, then the DrawingPatriach can't be extracted and no shapes can be determined

  • a shape in a .xlsx wasn't anchored via twoCellAnchor, then you don't get a ClientAnchor

Code:

import java.io.FileInputStream;
import java.io.IOException;

import org.apache.poi.hpsf.ClassIDPredefined;
import org.apache.poi.poifs.filesystem.DirectoryEntry;
import org.apache.poi.ss.usermodel.ChildAnchor;
import org.apache.poi.ss.usermodel.ClientAnchor;
import org.apache.poi.ss.usermodel.Drawing;
import org.apache.poi.ss.usermodel.ObjectData;
import org.apache.poi.ss.usermodel.Shape;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.ss.usermodel.WorkbookFactory;
import org.junit.Test;

public class TestEmbed {
    @Test
    public void extract() throws IOException {
//        String xlsName = "test-data/spreadsheet/WithEmbeddedObjects.xls";
        String xlsName = "embed.xlsx";
        try (FileInputStream fis = new FileInputStream(xlsName);
             Workbook xls = WorkbookFactory.create(fis)) {
            for (Sheet s : xls) {
                Drawing<?> dp = s.getDrawingPatriarch();
                if (dp != null) {
                    for (Shape sh : dp) {
                        if (sh instanceof ObjectData) {
                            ObjectData od = (ObjectData)sh;
                            String filename = od.getFileName();
                            String ext = null;

                            if (filename != null && !filename.isEmpty()) {
                                int i = filename.lastIndexOf('.');
                                ext = (i > 0) ? filename.substring(i) : ".bin";
                            } else {
                                String ct = null;

                                try {
                                    DirectoryEntry de = od.getDirectory();

                                    if (de != null) {
                                        ClassIDPredefined ctcls = ClassIDPredefined.lookup(de.getStorageClsid());
                                        if (ctcls != null) {
                                            ext = ctcls.getFileExtension();
                                        }
                                    }
                                } catch (Exception ignore) {
                                }
                            }

                            if (ext == null) {
                                ext = ".bin";
                            }

                            ChildAnchor chAnc = sh.getAnchor();
                            if (chAnc instanceof ClientAnchor) {
                                ClientAnchor anc = (ClientAnchor) chAnc;
                                System.out.println("Rows: " + anc.getRow1() + " to " + anc.getRow2() + " - filename: "+filename+" - ext: "+ext);
                            }
                        }
                    }
                }
            }
        }
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM