[英]How to replace text(tag) with HTML in docx using Apache POI?
We are going to have some template docx file, where will be some tags like ${content}. 我们将有一些模板docx文件,其中将包含一些标签,例如$ {content}。 I need to replace this tags with HTML.
我需要将此标记替换为HTML。
For this purpose I want to use altChunk element in XWPFDocument. 为此,我想在XWPFDocument中使用altChunk元素。 Following answer in How to add an altChunk element to a XWPFDocument using Apache POI , I could place altChunk in the end of docx.
按照如何使用Apache POI将altChunk元素添加到XWPFDocument中的回答之后,我可以将altChunk放在docx的末尾。
How can I replace my tag with it? 如何用它替换标签? Or could I use any other libraries, may be docx4j?
还是我可以使用其他任何库,例如docx4j?
UPD: Template docx files with tags are created by end users with MS Word and looks like: UPD:带有标签的模板docx文件是由最终用户使用MS Word创建的,看起来像:
If "${content}" is in a IBodyElement of it's own, then solving that requirement by finding that IBodyElement
, creating a XmlCursor
, inserting the altChunk
, then removing the IBodyElement
would be possible. 如果“ $ {content}”位于它自己的IBodyElement中,则可以通过找到
IBodyElement
,创建XmlCursor
,插入altChunk
,然后删除IBodyElement
来解决该要求。
The following code demonstrates this by extending the example in How to add an altChunk element to a XWPFDocument using Apache POI . 以下代码通过扩展如何使用Apache POI将altChunk元素添加到XWPFDocument中的示例来说明这一点。 It provides a method for replacing a found
IBodyElement
, which contains a special text, with a altChunk
which references a MyXWPFHtmlDocument
. 它提供了一种方法,用于使用
IBodyElement
替换找到的包含特殊文本的altChunk
,该MyXWPFHtmlDocument
引用MyXWPFHtmlDocument
。 It uses XmlCursor
to get the needed position in the text body. 它使用
XmlCursor
获取文本正文中所需的位置。 The usage of XmlCursor
is commented in the code. 在代码中注释了
XmlCursor
的用法。
template.docx: template.docx:
Code: 码:
import java.io.*;
import org.apache.poi.*;
import org.apache.poi.ooxml.*;
import org.apache.poi.openxml4j.opc.*;
import org.apache.poi.xwpf.usermodel.*;
import org.apache.xmlbeans.XmlCursor;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTAltChunk;
public class WordInsertHTMLaltChunkInDocument {
//a method for creating the htmlDoc /word/htmlDoc#.html in the *.docx ZIP archive
//String id will be htmlDoc#.
private static MyXWPFHtmlDocument createHtmlDoc(XWPFDocument document, String id) throws Exception {
OPCPackage oPCPackage = document.getPackage();
PackagePartName partName = PackagingURIHelper.createPartName("/word/" + id + ".html");
PackagePart part = oPCPackage.createPart(partName, "text/html");
MyXWPFHtmlDocument myXWPFHtmlDocument = new MyXWPFHtmlDocument(part, id);
document.addRelation(myXWPFHtmlDocument.getId(), new XWPFHtmlRelation(), myXWPFHtmlDocument);
return myXWPFHtmlDocument;
}
//a method for replacing a IBodyElement containing a special text with CTAltChunk which
//references MyXWPFHtmlDocument
private static void replaceIBodyElementWithAltChunk(XWPFDocument document, String textToFind,
MyXWPFHtmlDocument myXWPFHtmlDocument) throws Exception {
int pos = 0;
for (IBodyElement bodyElement : document.getBodyElements()) {
if (bodyElement instanceof XWPFParagraph) {
XWPFParagraph paragraph = (XWPFParagraph)bodyElement;
String text = paragraph.getText();
if (text != null && text.contains(textToFind)) {
//create XmlCursor at this paragraph
XmlCursor cursor = paragraph.getCTP().newCursor();
cursor.toEndToken(); //now we are at end of the paragraph
//there always must be a next start token. Either a p or at least sectPr.
while(cursor.toNextToken() != org.apache.xmlbeans.XmlCursor.TokenType.START);
//now we can insert the CTAltChunk here
String uri = CTAltChunk.type.getName().getNamespaceURI();
cursor.beginElement("altChunk", uri);
cursor.toParent();
CTAltChunk cTAltChunk = (CTAltChunk)cursor.getObject();
//set the altChunk's Id to reference the given MyXWPFHtmlDocument
cTAltChunk.setId(myXWPFHtmlDocument.getId());
//now remove the found IBodyElement
document.removeBodyElement(pos);
break; //break for each loop
}
}
pos++;
}
}
public static void main(String[] args) throws Exception {
XWPFDocument document = new XWPFDocument(new FileInputStream("template.docx"));
MyXWPFHtmlDocument myXWPFHtmlDocument = createHtmlDoc(document, "htmlDoc1");
myXWPFHtmlDocument.setHtml(myXWPFHtmlDocument.getHtml().replace("<body></body>",
"<body><p>Simple <b>HTML</b> <i>formatted</i> <u>text</u></p></body>"));
replaceIBodyElementWithAltChunk(document, "${content}", myXWPFHtmlDocument);
FileOutputStream out = new FileOutputStream("result.docx");
document.write(out);
out.close();
document.close();
}
//a wrapper class for the htmlDoc /word/htmlDoc#.html in the *.docx ZIP archive
//provides methods for manipulating the HTML
//TODO: We should *not* using String methods for manipulating HTML!
private static class MyXWPFHtmlDocument extends POIXMLDocumentPart {
private String html;
private String id;
private MyXWPFHtmlDocument(PackagePart part, String id) throws Exception {
super(part);
this.html = "<!DOCTYPE html><html><head><style></style><title>HTML import</title></head><body></body>";
this.id = id;
}
private String getId() {
return id;
}
private String getHtml() {
return html;
}
private void setHtml(String html) {
this.html = html;
}
@Override
protected void commit() throws IOException {
PackagePart part = getPackagePart();
OutputStream out = part.getOutputStream();
Writer writer = new OutputStreamWriter(out, "UTF-8");
writer.write(html);
writer.close();
out.close();
}
}
//the XWPFRelation for /word/htmlDoc#.html
private final static class XWPFHtmlRelation extends POIXMLRelation {
private XWPFHtmlRelation() {
super(
"text/html",
"http://schemas.openxmlformats.org/officeDocument/2006/relationships/aFChunk",
"/word/htmlDoc#.html");
}
}
}
result.docx: result.docx:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.