简体   繁体   English

如何使用 Java 编辑 MS Word 文档?

[英]How to edit MS Word documents using Java?

I do have few Word templates, and my requirement is to replace some of the words/place holders in the document based on the user input, using Java.我的 Word 模板很少,我的要求是使用 Java 根据用户输入替换文档中的一些单词/占位符。 I tried lot of libraries including 2-3 versions of docx4j but nothing work well, they all just didn't do anything!我尝试了很多库,包括 2-3 个版本的docx4j但没有任何效果,它们都没有做任何事情!

I know this question has been asked before, but I tried all options I know.我知道以前有人问过这个问题,但我尝试了我知道的所有选项。 So, using what java library I can "really" replace/edit these templates?那么,使用什么 java 库我可以“真正”替换/编辑这些模板? My preference goes to the "easy to use / Few line of codes" type libraries.我更喜欢“易于使用/几行代码”类型库。

I am using Java 8 and my MS Word templates are in MS Word 2007.我使用的是 Java 8,我的 MS Word 模板在 MS Word 2007 中。

Update更新

This code is written by using the code sample provided by SO member Joop Eggen此代码使用 SO 成员Joop Eggen提供的代码示例编写

public Main() throws URISyntaxException, IOException, ParserConfigurationException, SAXException
    {
        URI docxUri = new URI("C:/Users/Yohan/Desktop/yohan.docx");
        Map<String, String> zipProperties = new HashMap<>();
        zipProperties.put("encoding", "UTF-8");

         FileSystem zipFS = FileSystems.newFileSystem(docxUri, zipProperties);

           Path documentXmlPath = zipFS.getPath("/word/document.xml");

            DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

            factory.setNamespaceAware(true);
            DocumentBuilder builder = factory.newDocumentBuilder();

            Document doc = builder.parse(Files.newInputStream(documentXmlPath));

            byte[] content = Files.readAllBytes(documentXmlPath);
            String xml = new String(content, StandardCharsets.UTF_8);
            //xml = xml.replace("#DATE#", "2014-09-24");
            xml = xml.replace("#NAME#", StringEscapeUtils.escapeXml("Sniper"));

            content = xml.getBytes(StandardCharsets.UTF_8);
            Files.write(documentXmlPath, content);
    }

However this returns the below error但是,这会返回以下错误

java.nio.file.ProviderNotFoundException: Provider "C" Not found

at: java.nio.file.FileSystems.newFileSystem(FileSystems.java:341) at java.nio.file.FileSystems.newFileSystem(FileSystems.java:341)

at java.nio.fileFileSystems.newFileSystem(FileSystems.java:276)

One may use for docx (a zip with XML and other files) a java zip file system and XML or text processing. 可以将docx(带有XML和其他文件的zip)用于java zip文件系统以及XML或文本处理。

URI docxUri = ,,, // "jar:file:/C:/... .docx"
Map<String, String> zipProperties = new HashMap<>();
zipProperties.put("encoding", "UTF-8");
try (FileSystem zipFS = FileSystems.newFileSystem(docxUri, zipProperties)) {
    Path documentXmlPath = zipFS.getPath("/word/document.xml");

When using XML: 使用XML时:

    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

    factory.setNamespaceAware(true);
    DocumentBuilder builder = factory.newDocumentBuilder();

    Document doc = builder.parse(Files.newInputStream(documentXmlPath));
    //Element root = doc.getDocumentElement();

You can then use XPath to find the places, and write the XML back again. 然后,您可以使用XPath查找位置,然后再次写回XML。

It even might be that you do not need XML but could replace place holders: 甚至可能是您不需要XML,但可以替换占位符:

    byte[] content = Files.readAllBytes(documentXmlPath);
    String xml = new String(content, StandardCharsets.UTF_8);
    xml = xml.replace("#DATE#", "2014-09-24");
    xml = xml.replace("#NAME#", StringEscapeUtils.escapeXml("Sniper")));
    ...
    content = xml.getBytes(StandardCharsets.UTF_8);
    Files.delete(documentXmlPath);
    Files.write(documentXmlPath, content);

For a fast development, rename a copy of the .docx to a name with the .zip file extension, and inspect the files. 为了快速进行开发,请将.docx的副本重命名为带有.zip文件扩展名的名称,然后检查文件。

File.write should already apply StandardOpenOption.TRUNCATE_EXISTING, but I have added Files.delete as some error occured. File.write应该已经应用StandardOpenOption.TRUNCATE_EXISTING,但是由于发生了一些错误,我已经添加了Files.delete See comments. 看评论。

Try Apache POI . 试试Apache POI POI can work with doc and docx , but docx is more documented therefore support of it better. POI可以与docdocx ,但是docx的文档更多,因此对它的支持更好。

UPD : You can use XDocReport , which use POI. UPD :可以使用XPOReport ,它可以使用POI。 Also I recomend to use xlsx for templates because it more suitable and more documented 我也建议对模板使用xlsx ,因为它更合适且文档xlsx

I have spent a few days on this issue, until I found that what makes the difference is the try-with-resources on the FileSystem instance, appearing in Joop Eggen's snippet but not in question snippet: 我花了几天时间解决这个问题,直到发现与众不同的是FileSystem实例上的try-with-resources ,出现在Joop Eggen的代码段中,而不是有问题的代码段:
try (FileSystem zipFS = FileSystems.newFileSystem(docxUri, zipProperties))
Without such try-with-resources block, the FileSystem resource will not be closed (as explained in Java tutorial ), and the word document not modified. 没有这种try-with-resources块,将不会关闭FileSystem资源(如Java教程中所述 ),并且不会修改word文档。

Stepping back a bit, there are about 4 different approaches for editing words/placeholders: 退一步,大约有4种不同的方法来编辑单词/占位符:

  • MERGEFIELD or DOCPROPERTY fields (if you are having problems with this in docx4j, then you have probably not set up your input docx correctly) MERGEFIELD或DOCPROPERTY字段(如果您在docx4j中遇到此问题,则可能未正确设置输入docx)
  • content control databinding 内容控制数据绑定
  • variable replacement on the document surface (either at the DOM/SAX level, or using a library) 文档表面上的变量替换(在DOM / SAX级别或使用库)
  • do stuff as XHTML, then import that 做为XHTML,然后导入

Before choosing one, you should decide whether you also need to be able to handle: 选择一个之前,您应该决定是否还需要处理以下内容:

  • repeating data (eg adding table rows) 重复数据(例如添加表行)
  • conditional content (eg entire paragraphs which will either be present or absent) 有条件的内容(例如将出现或不出现的整个段落)
  • adding images 添加图像

If you need these, then MERGEFIELD or DOCPROPERTY fields are probably out (though you can also use IF fields, if you can find a library which supports them). 如果需要这些,则MERGEFIELD或DOCPROPERTY字段可能会用完(尽管您也可以使用IF字段(如果可以找到支持它们的库))。 And adding images makes DOM/SAX manipulation as advocated in one of the other answers, messier and error prone. 添加图像使DOM / SAX操作如其他答案之一所提倡的那样,更易产生混乱和错误。

The other things to consider are: 要考虑的其他事项是:

  • your authors: how technical are they? 您的作者:技术水平如何? What does that imply for the authoring UI? 这对于创作UI意味着什么?
  • the "user input" you mention for variable replacement, is this given, or is obtaining it part of the problem you are solving? 您提到的用于变量替换的“用户输入”,是给定的,还是获取它作为您要解决的问题的一部分?

Please try this to edit or replace the word in document请尝试此操作来编辑或替换文档中的单词

public class UpdateDocument {

    public static void main(String[] args) throws IOException {

        UpdateDocument obj = new UpdateDocument();

        obj.updateDocument(
                  "c:\\test\\template.docx",
                  "c:\\test\\output.docx",
                  "Piyush");
    }

    private void updateDocument(String input, String output, String name)
        throws IOException {

        try (XWPFDocument doc = new XWPFDocument(
                Files.newInputStream(Paths.get(input)))
        ) {

            List<XWPFParagraph> xwpfParagraphList = doc.getParagraphs();
            //Iterate over paragraph list and check for the replaceable text in each paragraph
            for (XWPFParagraph xwpfParagraph : xwpfParagraphList) {
                for (XWPFRun xwpfRun : xwpfParagraph.getRuns()) {
                    String docText = xwpfRun.getText(0);
                    //replacement and setting position
                    docText = docText.replace("${name}", name);
                    xwpfRun.setText(docText, 0);
                }
            }

            // save the docs
            try (FileOutputStream out = new FileOutputStream(output)) {
                doc.write(out);
            }

        }

    }

}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM