简体   繁体   中英

Apache POI XWPF - Check if a run contains a picture

My goal is to process a .docx document in Java, using Apache POI. I want to extract everything from the document to create a new one, but only with specific content, that I can choose from the processed document. That works so far for tables and text, but I have a Problem regarding pictures. Normally I would extract them like this:

List<XWPFPictureData> images = r.getEmbeddedPictures();

Where r is extracted from a paragraph and is of type XWPFRun . The big problem here is, that this solution only works for some images, it depends on how the image is inserted in the word document.

I can access the xml code of a run and tried to find images like this, that worked fine in python where you can state a xpath query. I tried the same in Java but got an error message.

Here is my code to check if a run contains an image:

r.getCTR().selectPath(".//w:drawing/wp:inline/a:graphic/a:graphicData/pic:pic/pic:blipFill/a:blip/@r:embed"))

And it returns this Exception: 在此处输入图片说明

All the available engines are namespace aware ones. So the namespaces must be declared.

import java.io.FileInputStream;

import org.apache.poi.xwpf.usermodel.*;

import org.apache.xmlbeans.XmlObject;

public class WordRunSelectPath {

 public static void main(String[] args) throws Exception {

  XWPFDocument document = new XWPFDocument(new FileInputStream("WordInsertPictures.docx"));
  for (XWPFParagraph paragraph : document.getParagraphs()) {
   for (XWPFRun run : paragraph.getRuns()) {
    String declareNameSpaces =   "declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main'; " 
                       + "declare namespace wp='http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing'; "
                       + "declare namespace a='http://schemas.openxmlformats.org/drawingml/2006/main'; "
                       + "declare namespace pic='http://schemas.openxmlformats.org/drawingml/2006/picture'; "
                       + "declare namespace r='http://schemas.openxmlformats.org/officeDocument/2006/relationships' ";

    XmlObject[] selectedObjects = run.getCTR().selectPath(
                         declareNameSpaces 
                       + ".//w:drawing/wp:inline/a:graphic/a:graphicData/pic:pic/pic:blipFill/a:blip/@r:embed");
    if (selectedObjects.length > 0) {
     String rID = selectedObjects[0].newCursor().getTextValue();
     System.out.println(rID);
    }
   }
  }

  document.close();
 }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM