[英]How to get the drawings from the apache POI XWPFDocument?
我試圖通過這種方式從 XWPFDocument 中獲取圖紙(我的 data.docx 只包含一個矩形,它是文本)。
XWPFDocument wordDocumentObj = new XWPFDocument(new FileInputStream(new File("data.docx")));
Iterator<IBodyElement> bodyElementIterator = wordDocumentObj.getBodyElementsIterator();
while(bodyElementIterator.hasNext()){
IBodyElement element = bodyElementIterator.next();
if (element instanceof XWPFParagraph) {
XWPFParagraph paragrapObj = (XWPFParagraph)element;
for(IRunElement irunObj : paragrapObj.getIRuns()) {
XWPFRun runObj = (XWPFRun)irunObj;
// I read whole the API doc, I think it is the only way to get the drawings
System.out.println(runObj.getCTR().getDrawingList());// No element returned
System.out.println(runObj.getCTR().getDrawingArray());// No element returned
}
}
}
你有什么想法從 XWPFDocument 中獲取圖紙嗎?
更新:XWPFRun 的 XML 內容。 我試圖提取word文件。 /word/* 目錄下沒有圖片:
<xml-fragment >
<mc:AlternateContent>
<mc:Choice Requires="wps">
<w:drawing>
<wp:anchor>
<a:graphic xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main">
<a:graphicData uri="http://schemas.microsoft.com/office/word/2010/wordprocessingShape">
<wps:wsp>
<wps:txbx>
<w:txbxContent>
<w:p w14:paraId="2744738E" w14:textId="0811E43C" w:rsidR="00832A19" w:rsidRDefault="00832A19" w:rsidP="00832A19">
<w:r>
<w:t>Some text here</w:t>
</w:r>
</w:p>
</w:txbxContent>
</wps:txbx>
</wps:wsp>
</a:graphicData>
</a:graphic>
</wp:anchor>
</w:drawing>
</mc:Choice>
<mc:Fallback>
<w:pict>
<v:rect w14:anchorId="684D682E" id="Rectangle 2" o:spid="_x0000_s1026" style="" fillcolor="#4f81bd [3204]" strokecolor="#243f60 [1604]" strokeweight="2pt">
<v:textbox>
<w:txbxContent>
<w:p w14:paraId="2744738E" w14:textId="0811E43C" w:rsidR="00832A19" w:rsidRDefault="00832A19" w:rsidP="00832A19">
<w:r>
<w:t>Some text here</w:t>
</w:r>
</w:p>
</w:txbxContent>
</v:textbox>
</v:rect>
</w:pict>
</mc:Fallback>
</mc:AlternateContent>
</xml-fragment>
Your provided XML
shows, your Word
document uses alternate content which was introduced after publishing Office Open XML
in 2007. So apache poi
does not provide methods to get that content as it only provides methods for Office Open XML
according standard ECMA-376
. 那是因為底層的ooxml-schemas
只是從那個ECMA-376
標准創建的。
所以AlternateContent
元素中的drawing
元素只能使用XML
( XPath
) 方法直接獲取。
這可能看起來像這樣:
import java.io.FileInputStream;
import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.*;
import org.apache.xmlbeans.XmlObject;
import org.apache.xmlbeans.XmlCursor;
import java.util.List;
import java.util.ArrayList;
public class WordGetAllDrawingsFromRuns {
private static List<CTDrawing> getAllDrawings(XWPFRun run) throws Exception {
CTR ctR = run.getCTR();
XmlCursor cursor = ctR.newCursor();
cursor.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//*/w:drawing");
List<CTDrawing> drawings = new ArrayList<CTDrawing>();
while (cursor.hasNextSelection()) {
cursor.toNextSelection();
XmlObject obj = cursor.getObject();
CTDrawing drawing = CTDrawing.Factory.parse(obj.newInputStream());
drawings.add(drawing);
}
return drawings;
}
public static void main(String[] args) throws Exception {
XWPFDocument document = new XWPFDocument(new FileInputStream("WordDocument.docx"));
for (IBodyElement bodyElement : document.getBodyElements()) {
if (bodyElement instanceof XWPFParagraph) {
XWPFParagraph paragraph = (XWPFParagraph) bodyElement;
for(IRunElement runElement : paragraph.getIRuns()) {
if (runElement instanceof XWPFRun) {
XWPFRun run = (XWPFRun) runElement;
List<CTDrawing> drawings = getAllDrawings(run);
System.out.println(drawings);
}
}
}
}
document.close();
}
}
但下一個問題將是如何從drawing
元素中獲取內容,因為<wps:wsp><wps:txbx>
根據標准ECMA-376
也不是Office Open XML
的一部分。 所以 CTDrawing 的CTDrawing
ooxml-schemas
方法也不能得到這些。 因此,如果需要從繪圖中獲取文本框內容,也只能直接使用XML
( XPath
) 方法。
這可能看起來像這樣:
private static CTTxbxContent getTextBoxContent(CTDrawing drawing) throws Exception {
XmlCursor cursor = drawing.newCursor();
cursor.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//*/w:txbxContent");
List<CTTxbxContent> txbxContents = new ArrayList<CTTxbxContent>();
while (cursor.hasNextSelection()) {
cursor.toNextSelection();
XmlObject obj = cursor.getObject();
CTTxbxContent txbxContent = CTTxbxContent.Factory.parse(obj.newInputStream());
txbxContents.add(txbxContent);
break;
}
CTTxbxContent txbxContent = null;
if (txbxContents.size() > 0) {
txbxContent = txbxContents.get(0);
}
return txbxContent;
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.