简体   繁体   中英

Regex or XML Parser C#

I have some word templates(dot/dotx) files that contain xml tags along with plain text.
At run time, I need to replace the xml tags with their respective mail merge fields.

So, need to parse the document for these xml tags and replace them with merge fields. I was using Regex to find and replace these xml tags. But I was suggested to use XML parser to parse for XML tags ([ Regex for string enclosed in <*>, C# ).

The sample document looks like:

Solicitor Letter

<Tfirm/>
<Tbuilding/>
<TstreetNumber/> <TstreetName/>

For the attention of: <TContact1/> <TEmail/>


Dear  <TContact1/>

RE: <Pbuilding/> <PstreetNumber/> <PstreetName/> <Pvillage/> <PTown/>

We were pleased to hear that contracts have now been exchanged in the sale of the 
above property on behalf of our mutual client/s.  We now have pleasure in enclosing a 
copy of our invoice for your kind attention upon completion.

....

One more note, the angle brackets are typed manually by end user in the template.

I tried using XMLReader, but got error as my documents have no root tags on their own.

Please guide if I should stick to Regex or is there any way to use XML Parser.

Thank you!

Unless you can get it structured as an XML document, the tools in the .NET Libraries to read XML are going to be entirely useless.

What you have is not XML. Having a tag or two that would qualify as XML does not an XML document make. The problem is that it simply does not follow any of the rules of XML.

Moral of the story is that you will have to come up with your own method to parse this. If you like to drink the RegEx kool-aid, that'll be the best solution for ya. Of course, there are plenty of ways to skin this cat.

It looks like you aren't actually using XML, just using a token that looks similar to XML as a placeholder for replacement.

If that's the case, you should be using Regex.

I would suggest neither. Microsoft has a free library in C# specifically for modifying open xml format documents without an installation of Microsoft Office.

OpenXML SDK

Doesn't seem like XML processing to me. It's not an XML doc. It's looks like straight string-replacement, and for that, you're better off with a Regular Expression.

An XML parser doesn't help you locate XML; it only helps you understand a given piece of XML. You will need some other mechanism, perhaps a Regex, to find the XML.

Seems that authors of most replies didnt read the question carefully.

inutan is asking for something that will parse Word documents. If a Word document is saved in docx format, it will be actually XML file that can be read by XML Reader or XPathReader, however I will not recomend to do it

Normally, mail merge with Word doesnt require any programming and XML parsing, see http://helpdesk.ua.edu/training/word/merg07.html

However if you still want to have XML-like fields in your Word templates and replace them with values, I would suggest using Word automation objects.

Below is an example of VBA code, for a similar code on other languages please refer MS Office development site http://msdn.microsoft.com/en-us/library/bb726434.aspx . For example if you use .NET - you should use Office interops and best of all is to install MS Visual Studio Tools for Office development http://msdn.microsoft.com/en-us/library/5s12ew2x.aspx

   With Selection.Find
        .Text = "<TContact1/>"
        .Replacement.Text = "TContact1"
        .Forward = True
        .Wrap = wdFindContinue
        .Format = False
        .MatchCase = False
        .MatchWholeWord = False
        .MatchWildcards = False
        .MatchSoundsLike = False
        .MatchAllWordForms = False
    End With
    Selection.Find.Execute Replace:=wdReplaceAll

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM