简体   繁体   English

使用OpenXML SDK轻松替换Word文档中的令牌

[英]Simplistic replacement of tokens in a Word Document using OpenXML SDK

I have a requirement where I would like users to type some string tokens into a Word document so that they can be replaced via a C# application with some values. 我有一个要求,我希望用户在Word文档中键入一些字符串标记,以便可以通过带有某些值的C#应用​​程序替换它们。 So say I have a document as per the image 所以说我有一个根据图片的文件

在此处输入图片说明

Now using the SDK I can read the document as follows: 现在,使用SDK,我可以阅读以下文档:

  private void InternalParseTags(WordprocessingDocument aDocumentToManipulate)
    {
        StringBuilder sbDocumentText = new StringBuilder();
        using (StreamReader sr = new StreamReader(aDocumentToManipulate.MainDocumentPart.GetStream()))
        {
            sbDocumentText.Append(sr.ReadToEnd());
        }

however as this comes back as the raw XML I cannot search for the tags easily as the underlying XML looks like: 但是作为原始XML回来时,我无法轻松搜索标签,因为基础XML看起来像这样:

<w:t>&lt;:</w:t></w:r><w:r w:rsidR="002E53FF" w:rsidRPr="000A794A"><w:t>Person.Meta.Age

(and obviously is not something I would have control over) instead of what I was hoping for namely: (而且显然不是我可以控制的),而不是我希望的:

<w:t>&lt;: Person.Meta.Age

OR 要么

<w:t><: Person.Meta.Age

So my question is how do I actually work on the string itself namely 所以我的问题是我该如何实际处理字符串本身

<: Person.Meta.Age :>

and still preserve formatting etc. so that when I have replaced the tokens with values I have: 并仍然保留格式等。因此,当我用值替换令牌时,我有:

在此处输入图片说明

Note: Bolding of the value of the second token value 注意:第二个令牌值的值以粗体显示

Do I need to iterate document elements or use some other approach? 我是否需要迭代文档元素或使用其他方法? All pointers greatly appreciated. 所有指针非常感谢。

This is a bit of a thorny problem with OpenXML. OpenXML有点棘手的问题。 The best solution I've come across is explained here: http://openxmldeveloper.org/blog/b/openxmldeveloper/archive/2011/06/13/open-xml-presentation-generation-using-a-template-presentation.aspx 这里介绍了我遇到的最佳解决方案: http : //openxmldeveloper.org/blog/b/openxmldeveloper/archive/2011/06/13/open-xml-presentation-generation-using-a-template-presentation。 ASPX

Basically Eric expands the content such that each character is in a run by itself, then looks for the run that starts a '<:' sequence and then the end sequence. 基本上,埃里克(Eric)会扩展内容,以使每个字符本身都在运行中,然后查找以'<:'序列开始,然后是结束序列的运行。 Then he does the substitution and recombines all runs that have the same attributes. 然后,他进行替换并重新组合具有相同属性的所有运行。

The example is for PowerPoint, which is generally much less content-intensive, so performance might be a factor in Word; 该示例是针对PowerPoint的,它通常不需要大量的内容,因此性能可能是Word中的一个因素。 I expect there are ways to narrow down the scope of paragraphs or whatever you have to blow up. 我希望有一些方法可以缩小段落的范围或您必须炸掉的内容。

For example, you can extract the text of the paragraph to see if it includes any placeholders and only do the expand/replace/condense operation on those paragraphs. 例如,您可以提取该段落的文本以查看其是否包含任何占位符,并且仅对这些段落进行扩展/替换/压缩操作。

Instead of doing find/replace of tokens directly, using OpenXML, you could use some 3rd party OpenXML-based template which is trivial to use and can pays itself off soon. 可以使用OpenXML而不是直接使用令牌来查找/替换令牌,而可以使用一些基于第三方的基于OpenXML的模板,该模板使用起来很简单,并且很快就会得到回报。

As Scanny pointed out, OpenXML is full of nasty details that one has to master on on-by-one basis. 正如Scanny所指出的那样,OpenXML充满了令人讨厌的细节,必须逐一掌握这些细节。 The learning curve is long and steep. 学习曲线长而陡峭。 If you want to become OpenXML guru then go for it and start climbing. 如果您想成为OpenXML专家,那就去做,然后开始攀登。 If you want to have time for some decent social life there are other alternatives: just pick one third party toolkit that is based on OpenXML. 如果您希望有时间享受一些体面的社交生活,则还有其他选择:只需选择一个基于OpenXML的第三方工具包。 I've evaluated Docentric Toolkit . 我已经评估了Docentric Toolkit It offers template based approach, where you prepare a template, which is a file in Word format, which contains placeholders for data that gets merged from the application at runtime. 它提供了基于模板的方法,您可以在其中准备一个模板,该模板是Word格式的文件,其中包含占位符,用于表示在运行时从应用程序合并的数据。 They all support any formatting that MS Word supports, you can use conditional content, tables, etc. 它们都支持MS Word支持的任何格式,您可以使用条件内容,表格等。

You can also create or change a document using DOM approach. 您还可以使用DOM方法创建或更改文档。 Final document can be .docx or .pdf. 最终文档可以是.docx或.pdf。

Docentric is licensed product, but you will soon compensate the cost by the time you will save using one of these tools. Docentric是获得许可的产品,但是您很快就会在使用这些工具之一节省下来的时间之前就补偿了费用。

If you will be running your application on a server, don't use interop - see this link for more details: ( http://support2.microsoft.com/kb/257757 ). 如果要在服务器上运行应用程序,请不要使用interop-有关更多详细信息,请参见此链接:( http://support2.microsoft.com/kb/257757 )。

Here is some code I slapped together pretty quickly to account for tokens spread across runs in the xml. 这是一些代码,我很快就将它们拍打在一起,以说明在xml中跨运行分布的令牌。 I don't know the library much, but was able to get this to work. 我不太了解图书馆,但是能够使它工作。 This could use some performance enhancements too because of all the looping. 由于所有循环,这也可能会使用一些性能增强功能。

/// <summary>
    /// Iterates through texts, concatenates them and looks for tokens to replace 
    /// </summary>
    /// <param name="texts"></param>
    /// <param name="tokenNameValuePairs"></param>
    /// <returns>T/F whether a token was replaced.  Should loop this call until it returns false.</returns>
    private bool IterateTextsAndTokenReplace(IEnumerable<Text> texts, IDictionary<string, object> tokenNameValuePairs)
    {
        List<Text> tokenRuns = new List<Text>();
        string runAggregate = String.Empty;
        bool replacedAToken = false;

        foreach (var run in texts)
        {
            if (run.Text.Contains(prefixTokenString) || runAggregate.Contains(prefixTokenString))
            {
                runAggregate += run.Text;
                tokenRuns.Add(run);

                if (run.Text.Contains(suffixTokenString))
                {
                    if (possibleTokenRegex.IsMatch(runAggregate))
                    {
                        string possibleToken = possibleTokenRegex.Match(runAggregate).Value;
                        string innerToken = possibleToken.Replace(prefixTokenString, String.Empty).Replace(suffixTokenString, String.Empty);
                        if (tokenNameValuePairs.ContainsKey(innerToken))
                        {
                            //found token!!!
                            string replacementText = runAggregate.Replace(prefixTokenString + innerToken + suffixTokenString, Convert.ToString(tokenNameValuePairs[innerToken]));
                            Text newRun = new Text(replacementText);
                            run.InsertAfterSelf(newRun);
                            foreach (Text runToDelete in tokenRuns)
                            {
                                runToDelete.Remove();
                            }
                            replacedAToken = true;
                        }
                    }
                    runAggregate = String.Empty;
                    tokenRuns.Clear();
                }
            }
        }

        return replacedAToken;
    }

string prefixTokenString = "{";
    string suffixTokenString = "}";

    Regex possibleTokenRegex = new Regex(prefixTokenString + "[a-zA-Z0-9-_]+" + suffixTokenString);

And some samples of calling the function: 以及一些调用函数的示例:

using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(memoryStream, true))
                        {
                            bool replacedAToken = true;

                            //continue to loop document until token's have not bee replaced.  This is because some tokens are spread across 'runs' and may need a second iteration of processing to catch them.
                            while (replacedAToken)
                            {
                                //get all the text elements
                                IEnumerable<Text> texts = wordDoc.MainDocumentPart.Document.Body.Descendants<Text>();
                                replacedAToken = this.IterateTextsAndTokenReplace(texts, tokenNameValuePairs);
                            }
                            wordDoc.MainDocumentPart.Document.Save();


                            foreach (FooterPart footerPart in wordDoc.MainDocumentPart.FooterParts)
                            {
                                if (footerPart != null)
                                {
                                    Footer footer = footerPart.Footer;

                                    if (footer != null)
                                    {
                                        replacedAToken = true;

                                        while (replacedAToken)
                                        {
                                            IEnumerable<Text> footerTexts = footer.Descendants<Text>();
                                            replacedAToken = this.IterateTextsAndTokenReplace(footerTexts, tokenNameValuePairs);
                                        }
                                        footer.Save();
                                    }
                                }
                            }

                            foreach (HeaderPart headerPart in wordDoc.MainDocumentPart.HeaderParts)
                            {
                                if (headerPart != null)
                                {
                                    Header header = headerPart.Header;

                                    if (header != null)
                                    {
                                        replacedAToken = true;

                                        while (replacedAToken)
                                        {
                                            IEnumerable<Text> headerTexts = header.Descendants<Text>();
                                            replacedAToken = this.IterateTextsAndTokenReplace(headerTexts, tokenNameValuePairs);
                                        }
                                        header.Save();
                                    }
                                }
                            }
                        }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM