Simplistic replacement of tokens in a Word Document using OpenXML SDK

Question

I have a requirement where I would like users to type some string tokens into a Word document so that they can be replaced via a C# application with some values. So say I have a document as per the image

在此处输入图片说明

Now using the SDK I can read the document as follows:

  private void InternalParseTags(WordprocessingDocument aDocumentToManipulate)
    {
        StringBuilder sbDocumentText = new StringBuilder();
        using (StreamReader sr = new StreamReader(aDocumentToManipulate.MainDocumentPart.GetStream()))
        {
            sbDocumentText.Append(sr.ReadToEnd());
        }

however as this comes back as the raw XML I cannot search for the tags easily as the underlying XML looks like:

<w:t>&lt;:</w:t></w:r><w:r w:rsidR="002E53FF" w:rsidRPr="000A794A"><w:t>Person.Meta.Age

(and obviously is not something I would have control over) instead of what I was hoping for namely:

<w:t>&lt;: Person.Meta.Age

OR

<w:t><: Person.Meta.Age

So my question is how do I actually work on the string itself namely

<: Person.Meta.Age :>

and still preserve formatting etc. so that when I have replaced the tokens with values I have:

在此处输入图片说明

Note: Bolding of the value of the second token value

Do I need to iterate document elements or use some other approach? All pointers greatly appreciated.

Answer 1

This is a bit of a thorny problem with OpenXML. The best solution I've come across is explained here: http://openxmldeveloper.org/blog/b/openxmldeveloper/archive/2011/06/13/open-xml-presentation-generation-using-a-template-presentation.aspx

Basically Eric expands the content such that each character is in a run by itself, then looks for the run that starts a '<:' sequence and then the end sequence. Then he does the substitution and recombines all runs that have the same attributes.

The example is for PowerPoint, which is generally much less content-intensive, so performance might be a factor in Word; I expect there are ways to narrow down the scope of paragraphs or whatever you have to blow up.

For example, you can extract the text of the paragraph to see if it includes any placeholders and only do the expand/replace/condense operation on those paragraphs.

Answer 2

Instead of doing find/replace of tokens directly, using OpenXML, you could use some 3rd party OpenXML-based template which is trivial to use and can pays itself off soon.

As Scanny pointed out, OpenXML is full of nasty details that one has to master on on-by-one basis. The learning curve is long and steep. If you want to become OpenXML guru then go for it and start climbing. If you want to have time for some decent social life there are other alternatives: just pick one third party toolkit that is based on OpenXML. I've evaluated Docentric Toolkit . It offers template based approach, where you prepare a template, which is a file in Word format, which contains placeholders for data that gets merged from the application at runtime. They all support any formatting that MS Word supports, you can use conditional content, tables, etc.

You can also create or change a document using DOM approach. Final document can be .docx or .pdf.

Docentric is licensed product, but you will soon compensate the cost by the time you will save using one of these tools.

If you will be running your application on a server, don't use interop - see this link for more details: ( http://support2.microsoft.com/kb/257757 ).

Answer 3

Here is some code I slapped together pretty quickly to account for tokens spread across runs in the xml. I don't know the library much, but was able to get this to work. This could use some performance enhancements too because of all the looping.

/// <summary>
    /// Iterates through texts, concatenates them and looks for tokens to replace 
    /// </summary>
    /// <param name="texts"></param>
    /// <param name="tokenNameValuePairs"></param>
    /// <returns>T/F whether a token was replaced.  Should loop this call until it returns false.</returns>
    private bool IterateTextsAndTokenReplace(IEnumerable<Text> texts, IDictionary<string, object> tokenNameValuePairs)
    {
        List<Text> tokenRuns = new List<Text>();
        string runAggregate = String.Empty;
        bool replacedAToken = false;

        foreach (var run in texts)
        {
            if (run.Text.Contains(prefixTokenString) || runAggregate.Contains(prefixTokenString))
            {
                runAggregate += run.Text;
                tokenRuns.Add(run);

                if (run.Text.Contains(suffixTokenString))
                {
                    if (possibleTokenRegex.IsMatch(runAggregate))
                    {
                        string possibleToken = possibleTokenRegex.Match(runAggregate).Value;
                        string innerToken = possibleToken.Replace(prefixTokenString, String.Empty).Replace(suffixTokenString, String.Empty);
                        if (tokenNameValuePairs.ContainsKey(innerToken))
                        {
                            //found token!!!
                            string replacementText = runAggregate.Replace(prefixTokenString + innerToken + suffixTokenString, Convert.ToString(tokenNameValuePairs[innerToken]));
                            Text newRun = new Text(replacementText);
                            run.InsertAfterSelf(newRun);
                            foreach (Text runToDelete in tokenRuns)
                            {
                                runToDelete.Remove();
                            }
                            replacedAToken = true;
                        }
                    }
                    runAggregate = String.Empty;
                    tokenRuns.Clear();
                }
            }
        }

        return replacedAToken;
    }

string prefixTokenString = "{";
    string suffixTokenString = "}";

    Regex possibleTokenRegex = new Regex(prefixTokenString + "[a-zA-Z0-9-_]+" + suffixTokenString);

And some samples of calling the function:

using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(memoryStream, true))
                        {
                            bool replacedAToken = true;

                            //continue to loop document until token's have not bee replaced.  This is because some tokens are spread across 'runs' and may need a second iteration of processing to catch them.
                            while (replacedAToken)
                            {
                                //get all the text elements
                                IEnumerable<Text> texts = wordDoc.MainDocumentPart.Document.Body.Descendants<Text>();
                                replacedAToken = this.IterateTextsAndTokenReplace(texts, tokenNameValuePairs);
                            }
                            wordDoc.MainDocumentPart.Document.Save();


                            foreach (FooterPart footerPart in wordDoc.MainDocumentPart.FooterParts)
                            {
                                if (footerPart != null)
                                {
                                    Footer footer = footerPart.Footer;

                                    if (footer != null)
                                    {
                                        replacedAToken = true;

                                        while (replacedAToken)
                                        {
                                            IEnumerable<Text> footerTexts = footer.Descendants<Text>();
                                            replacedAToken = this.IterateTextsAndTokenReplace(footerTexts, tokenNameValuePairs);
                                        }
                                        footer.Save();
                                    }
                                }
                            }

                            foreach (HeaderPart headerPart in wordDoc.MainDocumentPart.HeaderParts)
                            {
                                if (headerPart != null)
                                {
                                    Header header = headerPart.Header;

                                    if (header != null)
                                    {
                                        replacedAToken = true;

                                        while (replacedAToken)
                                        {
                                            IEnumerable<Text> headerTexts = header.Descendants<Text>();
                                            replacedAToken = this.IterateTextsAndTokenReplace(headerTexts, tokenNameValuePairs);
                                        }
                                        header.Save();
                                    }
                                }
                            }
                        }

Simplistic replacement of tokens in a Word Document using OpenXML SDK

Question

3 answers

solution1
0 2015-03-03 05:55:52

solution2
0 2015-03-21 00:12:29

solution3
0 2016-07-24 20:04:00

Simplistic replacement of tokens in a Word Document using OpenXML SDK

Question

3 answers

solution1 0 2015-03-03 05:55:52

solution2 0 2015-03-21 00:12:29

solution3 0 2016-07-24 20:04:00

solution1
0 2015-03-03 05:55:52

solution2
0 2015-03-21 00:12:29

solution3
0 2016-07-24 20:04:00