简体   繁体   English

如何使用 Apache POI 从 .docx 文件中检索水印文本?

[英]How to retrieve watermark text from .docx file using Apache POI?

How can I get the watermark text from.docx files using Apache POI如何使用 Apache POI 从.docx 文件中获取水印文本

In API Documentation, I have seen createWatermark(String text) but can't find getter for watermark.在 API 文档中,我看到createWatermark(String text)但找不到水印的吸气剂。

private File file;

public MSDocParser(String filePath, DataSource dataSource) {
   super(dataSource);
   this.file = new File(filePath);
}

public void parse(RunnableTask task) throws ParserException {
   textExtractor = ExtractorFactory.createExtractor(file);
   if (textExtractor instanceof XWPFWordExtractor) {
        XWPFDocument d = (XWPFDocument) textExtractor.getDocument();
        XWPFHeaderFooterPolicy hf = d.getHeaderFooterPolicy();

        // I want to print the watermark text here. 
    }
}

You can get the content of the watermark by the following code.您可以通过以下代码获取水印的内容。

XWPFHeaderFooterPolicy hf = doc.getHeaderFooterPolicy();
XWPFHeader header = hf.getDefaultHeader();
XWPFParagraph paragraph = header.getParagraphArray(0);

org.apache.xmlbeans.XmlObject[] xmlobjects = paragraph.getCTP().getRArray(0).getPictArray(0).selectChildren(
        new javax.xml.namespace.QName("urn:schemas-microsoft-com:vml", "shape"));

if (xmlobjects.length > 0) {
    com.microsoft.schemas.vml.CTShape ctshape = (com.microsoft.schemas.vml.CTShape)xmlobjects[0];
    CTTextPath text = ctshape.getTextpathArray(0);
    String watermarkContent = text.getString();

    System.out.println("WaterMark:"+watermarkContent);
}

This is the neatest way for getting text watermarks from a document.这是从文档中获取文本水印的最佳方式。

public String getWaterMark(XWPFDocument document) {
    var sbWaterMark = new StringBuilder();
    try {
        XWPFHeader defaultHeader = document.getHeaderFooterPolicy().getDefaultHeader();
        var declareNameSpaces = "declare namespace v='urn:schemas-microsoft-com:vml';";
        final var xpathFilter = "*//v:shape/v:textpath/@string";
        // a “watermark” in Word is nothing more than a graphic anchored to the header.
        XmlObject[] xmlobjects = defaultHeader._getHdrFtr().selectPath(declareNameSpaces + xpathFilter);

        if(xmlobjects != null && xmlobjects.length  > 0) {
             for (var xmlobj: xmlobjects {
                 sbWaterMark.append(
                     xmlobj.getDomNode().getNodeValue()).append("\n");
             }
        }
        return sbWaterMark.toString();
    } catch (NullPointerException ex) {
        return sbWaterMark.toString();
    } catch (Exception ex) {
        logAggregator.error("Error while getting Watermark content from document: ", ex);
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM