简体   繁体   English

从 C# 中的现有 DOCX 生成 DOCX

[英]Generate DOCX from an existing DOCX in C#

I have a new project where I need to generate a DOCX.我有一个需要生成 DOCX 的新项目。 My client has provided me with an existing DOCX where I need to replace some placeholders with some customer data from the database.我的客户为我提供了一个现有的 DOCX,我需要用数据库中的一些客户数据替换一些占位符。 As if this isn't challenging enough, there are certain parts that are optional based on some conditions using the customer data.好像这还不够具有挑战性,根据使用客户数据的某些条件,某些部分是可选的。 So I will have to provide some logic to totally omit some parts of the DOCX.所以我必须提供一些逻辑来完全省略 DOCX 的某些部分。

After way too much research and some POC's, I've come across a new approach.经过太多的研究和一些 POC,我遇到了一种新方法。 I've saved the DOCX as a Word XML Document.我已将 DOCX 保存为 Word XML 文档。 This creates a big XML file with everything in it, even the images are encoded as base64.这将创建一个包含所有内容的大型 XML 文件,甚至图像也被编码为 base64。 After doing that I copied the content of the XML file to a T4-template.之后,我将 XML 文件的内容复制到 T4 模板。 Doing this allows me to add dynamic content based on the customer data and generate a Word XML Document in my code as a large string.这样做允许我根据客户数据添加动态内容,并在我的代码中生成一个 Word XML 文档作为一个大字符串。

But now I'm stuck at creating a Docx again based on the Word XML Document string.但现在我坚持基于 Word XML 文档字符串再次创建 Docx。 I've tried using the OpenXml Sdk but can't find any real documentation on how to do this.我试过使用 OpenXml Sdk 但找不到任何关于如何做到这一点的真实文档。 After some experimentation I ended up with the code below but it doesn't parse XML (Data at the root level is invalid. Line 1, position 1).经过一些实验,我最终得到了下面的代码,但它没有解析 XML(根级别的数据无效。第 1 行,position 1)。

As a second attempt, I tried out some suggestion from another post but this results in another exception (The XML has invalid content and cannot be constructed as an element. (Parameter 'outerXml'))作为第二次尝试,我尝试了另一篇文章的一些建议,但这导致了另一个异常(XML 的内容无效,不能构造为元素。(参数'outerXml'))

Is there a way to do this or should I just leave the T4-template and try another approach?有没有办法做到这一点,还是我应该离开 T4 模板并尝试另一种方法? Another problem with the T4-template is the size of some the images, it results in a long base64 string that just generates way too much lines. T4 模板的另一个问题是一些图像的大小,它会导致长的 base64 字符串生成太多行。 I guess I could replace the images with placeholders and swap them just before I create the XML...我想我可以在创建 XML 之前用占位符替换图像并交换它们......

    public FileData CreateDocx(string title, string xml)
    {
        using (MemoryStream generatedDocument = new MemoryStream())
        {
            using (WordprocessingDocument package =
                WordprocessingDocument.Create(generatedDocument, WordprocessingDocumentType.Document))
            {
                var mainPart = package.AddMainDocumentPart();
                //First attempt
                //new Document(xml).Save(mainPart);

                var doc = new XmlDocument();
                doc.LoadXml(xml);
                new Document(doc.OuterXml).Save(mainPart);
            }

            return new FileData(title, generatedDocument.ToArray());
        }
    }

Based on the feedback of Thomas Weller, I tried out DocX .根据 Thomas Weller 的反馈,我试用了DocX This library makes it way easier to open/duplicate/create DOCX files.这个库使打开/复制/创建 DOCX 文件变得更加容易。 After some research I totally changed my approach.经过一番研究,我完全改变了我的方法。 I ended up using the existing DOCX as a template.我最终使用现有的 DOCX 作为模板。

First of all I added placeholders to the paragraphs where I needed to inject data from database.首先,我在需要从数据库中注入数据的段落中添加了占位符 For this I used something like {{CustomerName}}.为此,我使用了 {{CustomerName}} 之类的东西。 By using the replaceText I was able to swap all the placeholders with the correct data.通过使用replaceText ,我能够用正确的数据交换所有占位符。

After doing this I added sections .这样做之后,我添加了部分 This can be done easily in Word by using this guide .这可以通过使用本指南在 Word 中轻松完成。 Once the sections were added I also added a placeholder to mark the sections since you can't name a section in Word.添加部分后,我还添加了一个占位符来标记这些部分,因为您无法在 Word 中命名一个部分。 So I ended up with placeholders at the beginning of the sections like {{SectionNationalCustomer}}.所以我最终在 {{SectionNationalCustomer}} 等部分的开头使用了占位符 This allowed me to lookup my section with a Linq query to search through all the section with a paragraph that contained my placeholder.这使我可以使用Linq 查询查找我的部分,以搜索包含我的占位符的段落的所有部分。

Once I collected the conditional sections, I was able to 'remove' them by looping over all the SectionParagraphs and removing them.一旦我收集了条件部分,我就可以通过遍历所有SectionParagraphs删除它们来“删除”它们。 A total remove of the sections doesn't seem possible.似乎不可能完全删除这些部分。 When the section needed to be visible, it was only a matter of replacing the placeholder with an empty string.当该部分需要可见时,只需将占位符替换为空字符串即可。

The final thing I need was to find the correct table in the document.我需要做的最后一件事是在文档中找到正确的表格。 I tried the same approach as before by using a new section.我通过使用新部分尝试了与以前相同的方法。 But It seems like the Tables Collection of the Section object is always empty even if there is a Table in it.但似乎 object 部分的表集合总是空的,即使其中有一个表。 So I needed another approach.所以我需要另一种方法。 Again I made use of a unique placeholder in the first column of the table like {{TableQuotation}}.我再次在表格的第一列中使用了一个独特的占位符,例如 {{TableQuotation}}。 Then I just did the same as with the sections and wrote a Linq query to select the right table by looking for a paragraph with the right placeholder.然后我只是对这些部分执行相同的操作,并通过查找具有正确占位符的段落来编写一个Linq 查询到 select正确的表。

After all this I ended up with some code that looked very similar to this:毕竟,我最终得到了一些看起来与此非常相似的代码:

using (var memoryStream = new MemoryStream())
{
    // Load  template document and make copy
    using (var template = DocX.Load("MyTemplate.docx"))
    {
        var document = template.Copy();

        //Swap placeholder with data
        document.ReplaceText("{{CustomerName}}", myData.CustomerName);

        //Hide or show section based on condition
        var section = document.Sections.FirstOrDefault(s => s.SectionParagraphs.Any(p => p.Text.StartsWith("{{SectionNationalCustomer}}")));
        if (myData.Customer.Address.National == true)
        {
            //Remove placeholder when section stays visible
            document.ReplaceText("{{SectionNationalCustomer}}", "");
        }
        else
        {
            //Remove contents of section
            foreach (var paragraph in section.SectionParagraphs)
            {
                document.RemoveParagraph(paragraph);
            }
        }

        //Find and edit table
        var table = document.Tables.FirstOrDefault(s => s.Paragraphs.Any(p => p.Text.Contains("{{TableQuotation}}")));
        document.ReplaceText("{{TableQuotation}}", "");
        table.RemoveRow(1);
        
        document.SaveAs(memoryStream);
    }

    return memoryStream.ToArray();
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM