简体   繁体   中英

What's the best way to parse XML in the middle of other text

How can I parse an xml in the midle of other text.

Example: If I have this text file in C# how can I parse the xml part:

-> Begin of file

2010-01-01 tehgvdhjjsad  
2010-01-02 dsjhnxcucncu  
14:55 iahsdahksdjh  

<Answer>
<headline>
<a1>1</a1>
<a2>2</a2>
</headline>
</Answer>
2010-01-05 tehgvddsda  
2010-01-05 ddsada  
22:55 iahsdahksdjh2  

<Answer>
<headline>
<a1>11</a1>
<a2>22</a2>
</headline>
</Answer>
-> End of file

Several ways:

 1. Do a string.IndexOf("<Answer>") and then use a substring to chop off the header information.  Then add the substring like this:
xmlString = "<Answers>" + substringXml + "</Answers>".  Then you could parse the xml as valid XML.
 2. Use an xmltextreader created with fragment conformance levels and read through the xml.  Only stop on the Answer elements and do processing.
 3. Add a root element to the document and open it in an XmlDocument and use an xpath expression to read out the Answer elements.

Well, there aren't many things that can help you with something that. AFAIK there are two possibilities:

Option 1. If all the xml fragments have the same root-node, ie. "<Answer>", then you can simply find loop through the occurrences of <Answer> finding the next occurence of the closing </Answer>, extract the text between the two and use a normal XML parser.

Option 2. If it's a anything xml goes kind of thing then you could use this Regex based Html Parser I wrote some time ago. It should handle that input without issue; however, you will have to deal with the open/close elements and determine what to do with them.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM