简体   繁体   中英

How to parse XML in a string in .NET?

Hi Fellow StackOverflowers,

I am receiving a string in one of my .NET function. The string when viewed from the XML Visualizer looks like this:

- <root>
- <Table>
  <ID>ABC-123</ID>
  <CAT>Housekeeping</CAT>
  <DATE>21-JUN-2009</DATE>
  <REP_BY>John</REP_BY>
  <LOCATION>Head Office</LOCATION>
</Table>
- <Table>
  <ID>ABC-124</ID>
  <CAT>Environment</CAT>
  <DATE>23-JUN-2009</DATE>
  <REP_BY>Michelle</REP_BY>
  <LOCATION>Block C</LOCATION>
</Table>
- <Table>
  <ID>ABC-125</ID>
  <CAT>Staging</CAT>
  <DATE>21-JUN-2009</DATE>
  <REP_BY>George</REP_BY>
  <LOCATION>Head Office</LOCATION>
</Table>  
- <Table>
  <ID>ABC-123</ID>
  <CAT>Housekeeping</CAT>
  <DATE>21-JUN-2009</DATE>
  <REP_BY>John</REP_BY>
  <LOCATION space="preserve" xmlns="http://www.w3.org/XML/1998/namespace" /> 
</Table>  
</root>  

I need to parse this string so that I could write the data into a datatable whose columns are the xml tags for each data.

In the above text, I would then have a datatable that wil have 5 columns, named ID, CAT, DATE, REP_BY and LOCATION which will then contain 4 rows of data.

In the fourth tag, notice that the does not have any data, but rather it is marked space="preserve". This would mean that the data I am placing in my datatable would be blank for the LOCATION column of the fourth row.

How can I achieve this? Sample codes would be highly appreciated. Thanks.

Using the XmlReader class. This class is fast and does not use a lot of memory but reading the xml can be difficult.

using (StringReader strReader = new StringReader(yourXMLString))
{
    using (XmlReader reader = XmlReader.Create(strReader))
    {
        while (reader.Read())
        {
            if(reader.Name == "Table" && reader.NodeType == reader.NodeType == XmlNodeType.Element)
            {
                using(XmlReader tableReader = reader.ReadSubtree())
                {
                    ReadTableNode(tableReader);
                }
            }
        }
    }
}

private void ReadTableNode(XmlReader reader)
{
    while (reader.Read())
    {
        if(reader.Name == "ID" && reader.NodeType == reader.NodeType == XmlNodeType.Element)
            //do something
        else if(reader.Name == "CAT" && reader.NodeType == reader.NodeType == XmlNodeType.Element)
            //do something

       //and continue....
    }
}

To get an attribute of the current node you use:

string value = reader.GetAttribute(name_of_attribute);

To get the inner text of an element:

string innerText =  reader.ReadString();

Using the XmlDocument class. This class is slow but manipulating and reading the xml is very easy because the entire xml is loaded.

XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(yourXMLString);
//do something

Using the XDocument class. The advantage of using XDocument is elements can be accessed directly and simultaneously. This class also use the power of LINQ to query the xml document.

using(StringReader tr = new StringReader(yourXMLString))
{
    XDocument doc = XDocument.Load(tr);
    //do something
}

This is probably the simplest solution to get the XML into table form. Throwing the attributes out using regular expressions is not that smart (and safe), but I don't like the System.Xml API and LINQ to XML is no option in .NET 2.0.

using System;
using System.Data;
using System.IO;
using System.Text.RegularExpressions;

namespace GeneralTestApplication
{
    class Program
    {
        private static void Main()
        {
            String input = @"<root><Table> [...] </root>";

            input = Regex.Replace(input, @" [a-zA-Z]+=""[^""]*""", String.Empty);

            DataSet dataSet = new DataSet();

            dataSet.ReadXml(new StringReader(input));

            foreach (DataRow row in dataSet.Tables[0].Rows)
            {
                foreach (DataColumn column in dataSet.Tables[0].Columns)
                {
                    Console.Write(row[column] + " | ");
                }
                Console.WriteLine();
            }

            Console.ReadLine();
        }
    }
}

UPDATE

Or get rid of the attribute using System.Xml .

XmlDocument doc = new XmlDocument();

doc.Load(new StringReader(input));

foreach (XmlNode node in doc.SelectNodes("descendant-or-self::*"))
{
    node.Attributes.RemoveAll();
}

input = doc.OuterXml;

But this doesn't work because the XML namespace on the last LOCATION element remains and the DataSet.LoadXml() complains that there connot be two columns named LOCATION .

Don't use string parsing. Try using some xml library ( Linq has some objects that might help you). You will probably do that much more easily.

I believe that you can simply use the ADO.NET DataSet class's ReadXml method to read an XML document in that format, and it will create the DataTable , DataColumn , and DataRow objects for you. You'll need to write a little conversion method if you want to subsequently turn the DATE column's data type to DateTime . But other than that, you shouldn't have to screw around with XML at all.

Edit

I see from Daniel Bruckner's post that the LOCATION elements in the odd namespace pose a problem. Well, that's easy enough to fix:

    XmlDocument d = new XmlDocument();
    d.LoadXml(xml);

    XmlNamespaceManager ns = new XmlNamespaceManager(d.NameTable);
    ns.AddNamespace("n", "http://www.w3.org/XML/1998/namespace");
    foreach (XmlNode n in d.SelectNodes("/root/Table/n:LOCATION", ns))
    {
        XmlElement loc = d.CreateElement("LOCATION");
        n.ParentNode.AppendChild(loc);
        n.ParentNode.RemoveChild(n);
    }

    DataSet ds = new DataSet();
    using (StringReader sr = new StringReader(d.OuterXml))
    {
        ds.ReadXml(sr);
    }

I'm not a huge fan of xml myself, I need to use it as the datasource of a grid to visualize it. I get some output from our FileNet imaging server in xml format and I need to get pieces of it out to populate a database. Here's what I'm doing, HTH:

  Dim dsXML As DataSet
  Dim drXML As DataRow
  Dim rdr As System.IO.StringReader
  Dim docs() As String
  Dim SQL As String
  Dim xml As String
  Dim fnID As String

docs = _fnP8Dev.getDocumentsXML(_credToken, _docObjectStoreName, _docClass, "ReferenceNumber=" & fnID, "")
xml = docs(0)
If (InStr(xml, "<z:row") > 0) Then
 RaiseEvent msg("Inserting images for reference number " & fnID)
 rdr = New System.IO.StringReader(xml)
 dsXML = New DataSet
 dsXML.ReadXml(rdr)

 For Each drXML In dsXML.Tables(dsXML.Tables.Count - 1).Rows
   SQL = "Insert into fnImageP8 values ("
   SQL = SQL & "'" & drXML("Id") & "', "
   Try
    SQL = SQL & "'" & drXML("DocumentTitle") & "', "
   Catch ex As Exception
    SQL = SQL & "null, "
   End Try

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM