如何使用Apache POI在docx中用HTML替换text（tag）？

Question

We are going to have some template docx file, where will be some tags like ${content}. 我们将有一些模板docx文件，其中将包含一些标签，例如$ {content}。 I need to replace this tags with HTML. 我需要将此标记替换为HTML。

For this purpose I want to use altChunk element in XWPFDocument. 为此，我想在XWPFDocument中使用altChunk元素。 Following answer in How to add an altChunk element to a XWPFDocument using Apache POI , I could place altChunk in the end of docx. 按照如何使用Apache POI将altChunk元素添加到XWPFDocument中的回答之后，我可以将altChunk放在docx的末尾。

How can I replace my tag with it? 如何用它替换标签？ Or could I use any other libraries, may be docx4j? 还是我可以使用其他任何库，例如docx4j？

UPD: Template docx files with tags are created by end users with MS Word and looks like: UPD：带有标签的模板docx文件是由最终用户使用MS Word创建的，看起来像：

Answer 1

If "${content}" is in a IBodyElement of it's own, then solving that requirement by finding that IBodyElement , creating a XmlCursor , inserting the altChunk , then removing the IBodyElement would be possible. 如果“ $ {content}”位于它自己的IBodyElement中，则可以通过找到IBodyElement ，创建XmlCursor ，插入altChunk ，然后删除IBodyElement来解决该要求。

The following code demonstrates this by extending the example in How to add an altChunk element to a XWPFDocument using Apache POI . 以下代码通过扩展如何使用Apache POI将altChunk元素添加到XWPFDocument中的示例来说明这一点。 It provides a method for replacing a found IBodyElement , which contains a special text, with a altChunk which references a MyXWPFHtmlDocument . 它提供了一种方法，用于使用IBodyElement替换找到的包含特殊文本的altChunk ，该MyXWPFHtmlDocument引用MyXWPFHtmlDocument 。 It uses XmlCursor to get the needed position in the text body. 它使用XmlCursor获取文本正文中所需的位置。 The usage of XmlCursor is commented in the code. 在代码中注释了XmlCursor的用法。

template.docx: template.docx：

Code: 码：

import java.io.*;

import org.apache.poi.*;
import org.apache.poi.ooxml.*;
import org.apache.poi.openxml4j.opc.*;

import org.apache.poi.xwpf.usermodel.*;

import org.apache.xmlbeans.XmlCursor;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTAltChunk;

public class WordInsertHTMLaltChunkInDocument {

 //a method for creating the htmlDoc /word/htmlDoc#.html in the *.docx ZIP archive  
 //String id will be htmlDoc#.
 private static MyXWPFHtmlDocument createHtmlDoc(XWPFDocument document, String id) throws Exception {
  OPCPackage oPCPackage = document.getPackage();
  PackagePartName partName = PackagingURIHelper.createPartName("/word/" + id + ".html");
  PackagePart part = oPCPackage.createPart(partName, "text/html");
  MyXWPFHtmlDocument myXWPFHtmlDocument = new MyXWPFHtmlDocument(part, id);
  document.addRelation(myXWPFHtmlDocument.getId(), new XWPFHtmlRelation(), myXWPFHtmlDocument);
  return myXWPFHtmlDocument;
 }

 //a method for replacing a IBodyElement containing a special text with CTAltChunk which
 //references MyXWPFHtmlDocument
 private static void replaceIBodyElementWithAltChunk(XWPFDocument document, String textToFind, 
                                                     MyXWPFHtmlDocument myXWPFHtmlDocument) throws Exception {
  int pos = 0;
  for (IBodyElement bodyElement : document.getBodyElements()) {
   if (bodyElement instanceof XWPFParagraph) {
    XWPFParagraph paragraph = (XWPFParagraph)bodyElement;
    String text = paragraph.getText();
    if (text != null && text.contains(textToFind)) {
     //create XmlCursor at this paragraph
     XmlCursor cursor = paragraph.getCTP().newCursor();
     cursor.toEndToken(); //now we are at end of the paragraph
     //there always must be a next start token. Either a p or at least sectPr.
     while(cursor.toNextToken() != org.apache.xmlbeans.XmlCursor.TokenType.START);
     //now we can insert the CTAltChunk here
     String uri = CTAltChunk.type.getName().getNamespaceURI();
     cursor.beginElement("altChunk", uri);
     cursor.toParent();
     CTAltChunk cTAltChunk = (CTAltChunk)cursor.getObject();
     //set the altChunk's Id to reference the given MyXWPFHtmlDocument
     cTAltChunk.setId(myXWPFHtmlDocument.getId());

     //now remove the found IBodyElement
     document.removeBodyElement(pos);

     break; //break for each loop
    }
   }
   pos++;
  }
 }

 public static void main(String[] args) throws Exception {

  XWPFDocument document = new XWPFDocument(new FileInputStream("template.docx"));

  MyXWPFHtmlDocument myXWPFHtmlDocument = createHtmlDoc(document, "htmlDoc1");
  myXWPFHtmlDocument.setHtml(myXWPFHtmlDocument.getHtml().replace("<body></body>",
   "<body><p>Simple <b>HTML</b> <i>formatted</i> <u>text</u></p></body>"));

  replaceIBodyElementWithAltChunk(document, "${content}", myXWPFHtmlDocument);

  FileOutputStream out = new FileOutputStream("result.docx");
  document.write(out);
  out.close();
  document.close();

 }

 //a wrapper class for the  htmlDoc /word/htmlDoc#.html in the *.docx ZIP archive
 //provides methods for manipulating the HTML
 //TODO: We should *not* using String methods for manipulating HTML!
 private static class MyXWPFHtmlDocument extends POIXMLDocumentPart {

  private String html;
  private String id;

  private MyXWPFHtmlDocument(PackagePart part, String id) throws Exception {
   super(part);
   this.html = "<!DOCTYPE html><html><head><style></style><title>HTML import</title></head><body></body>";
   this.id = id;
  }

  private String getId() {
   return id;
  }

  private String getHtml() {
   return html;
  }

  private void setHtml(String html) {
   this.html = html;
  }

  @Override
  protected void commit() throws IOException {
   PackagePart part = getPackagePart();
   OutputStream out = part.getOutputStream();
   Writer writer = new OutputStreamWriter(out, "UTF-8");
   writer.write(html);
   writer.close();
   out.close();
  }

 }

 //the XWPFRelation for /word/htmlDoc#.html
 private final static class XWPFHtmlRelation extends POIXMLRelation {
  private XWPFHtmlRelation() {
   super(
    "text/html", 
    "http://schemas.openxmlformats.org/officeDocument/2006/relationships/aFChunk", 
    "/word/htmlDoc#.html");
  }
 }
}

result.docx: result.docx：

如何使用Apache POI在docx中用HTML替换text（tag）？

问题描述

1 个解决方案

解决方案1
2 已采纳 2018-12-20 16:10:56

如何使用Apache POI在docx中用HTML替换text（tag）？

问题描述

1 个解决方案

解决方案1 2 已采纳 2018-12-20 16:10:56

解决方案1
2 已采纳 2018-12-20 16:10:56