简体   繁体   English

如何在.NET中解析字符串中的XML?

[英]How to parse XML in a string in .NET?

Hi Fellow StackOverflowers, 大家好,StackOverflowers,

I am receiving a string in one of my .NET function. 我在.NET函数之一中收到一个字符串。 The string when viewed from the XML Visualizer looks like this: 从XML Visualizer查看时,该字符串如下所示:

- <root>
- <Table>
  <ID>ABC-123</ID>
  <CAT>Housekeeping</CAT>
  <DATE>21-JUN-2009</DATE>
  <REP_BY>John</REP_BY>
  <LOCATION>Head Office</LOCATION>
</Table>
- <Table>
  <ID>ABC-124</ID>
  <CAT>Environment</CAT>
  <DATE>23-JUN-2009</DATE>
  <REP_BY>Michelle</REP_BY>
  <LOCATION>Block C</LOCATION>
</Table>
- <Table>
  <ID>ABC-125</ID>
  <CAT>Staging</CAT>
  <DATE>21-JUN-2009</DATE>
  <REP_BY>George</REP_BY>
  <LOCATION>Head Office</LOCATION>
</Table>  
- <Table>
  <ID>ABC-123</ID>
  <CAT>Housekeeping</CAT>
  <DATE>21-JUN-2009</DATE>
  <REP_BY>John</REP_BY>
  <LOCATION space="preserve" xmlns="http://www.w3.org/XML/1998/namespace" /> 
</Table>  
</root>  

I need to parse this string so that I could write the data into a datatable whose columns are the xml tags for each data. 我需要解析此字符串,以便可以将数据写入一个数据表,该数据表的列是每个数据的xml标记。

In the above text, I would then have a datatable that wil have 5 columns, named ID, CAT, DATE, REP_BY and LOCATION which will then contain 4 rows of data. 在上面的文本中,我将有一个数据表,该表有5列,分别是ID,CAT,DATE,REP_BY和LOCATION,它们将包含4行数据。

In the fourth tag, notice that the does not have any data, but rather it is marked space="preserve". 在第四个标记中,请注意,该标记没有任何数据,而是标记为space =“ preserve”。 This would mean that the data I am placing in my datatable would be blank for the LOCATION column of the fourth row. 这意味着我要放置在数据表中的数据对于第四行的LOCATION列将为空白。

How can I achieve this? 我该如何实现? Sample codes would be highly appreciated. 样本代码将不胜感激。 Thanks. 谢谢。

Using the XmlReader class. 使用XmlReader类。 This class is fast and does not use a lot of memory but reading the xml can be difficult. 此类很快速,并且不占用大量内存,但是读取xml可能很困难。

using (StringReader strReader = new StringReader(yourXMLString))
{
    using (XmlReader reader = XmlReader.Create(strReader))
    {
        while (reader.Read())
        {
            if(reader.Name == "Table" && reader.NodeType == reader.NodeType == XmlNodeType.Element)
            {
                using(XmlReader tableReader = reader.ReadSubtree())
                {
                    ReadTableNode(tableReader);
                }
            }
        }
    }
}

private void ReadTableNode(XmlReader reader)
{
    while (reader.Read())
    {
        if(reader.Name == "ID" && reader.NodeType == reader.NodeType == XmlNodeType.Element)
            //do something
        else if(reader.Name == "CAT" && reader.NodeType == reader.NodeType == XmlNodeType.Element)
            //do something

       //and continue....
    }
}

To get an attribute of the current node you use: 要获取当前节点的属性,请使用:

string value = reader.GetAttribute(name_of_attribute);

To get the inner text of an element: 要获取元素的内部文本:

string innerText =  reader.ReadString();

Using the XmlDocument class. 使用XmlDocument类。 This class is slow but manipulating and reading the xml is very easy because the entire xml is loaded. 此类很慢,但是由于加载了整个xml,因此操作和读取xml非常容易。

XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(yourXMLString);
//do something

Using the XDocument class. 使用XDocument类。 The advantage of using XDocument is elements can be accessed directly and simultaneously. 使用XDocument的优点是可以直接并同时访问元素。 This class also use the power of LINQ to query the xml document. 该类还使用LINQ的功能来查询xml文档。

using(StringReader tr = new StringReader(yourXMLString))
{
    XDocument doc = XDocument.Load(tr);
    //do something
}

This is probably the simplest solution to get the XML into table form. 这可能是将XML转换为表格形式的最简单解决方案。 Throwing the attributes out using regular expressions is not that smart (and safe), but I don't like the System.Xml API and LINQ to XML is no option in .NET 2.0. 使用正则表达式扔掉属性并不是那么聪明(而且安全),但是我不喜欢System.Xml API和.NET 2.0中的LINQ to XML在这方面是不可行的。

using System;
using System.Data;
using System.IO;
using System.Text.RegularExpressions;

namespace GeneralTestApplication
{
    class Program
    {
        private static void Main()
        {
            String input = @"<root><Table> [...] </root>";

            input = Regex.Replace(input, @" [a-zA-Z]+=""[^""]*""", String.Empty);

            DataSet dataSet = new DataSet();

            dataSet.ReadXml(new StringReader(input));

            foreach (DataRow row in dataSet.Tables[0].Rows)
            {
                foreach (DataColumn column in dataSet.Tables[0].Columns)
                {
                    Console.Write(row[column] + " | ");
                }
                Console.WriteLine();
            }

            Console.ReadLine();
        }
    }
}

UPDATE 更新

Or get rid of the attribute using System.Xml . 或者使用System.Xml摆脱该属性。

XmlDocument doc = new XmlDocument();

doc.Load(new StringReader(input));

foreach (XmlNode node in doc.SelectNodes("descendant-or-self::*"))
{
    node.Attributes.RemoveAll();
}

input = doc.OuterXml;

But this doesn't work because the XML namespace on the last LOCATION element remains and the DataSet.LoadXml() complains that there connot be two columns named LOCATION . 但这是行不通的,因为最后一个LOCATION元素上的XML名称空间仍然保留,并且DataSet.LoadXml()抱怨说没有两列名为LOCATION列。

Don't use string parsing. 不要使用字符串解析。 Try using some xml library ( Linq has some objects that might help you). 尝试使用一些xml库( Linq有一些对象可能会帮助您)。 You will probably do that much more easily. 您可能会轻松得多。

I believe that you can simply use the ADO.NET DataSet class's ReadXml method to read an XML document in that format, and it will create the DataTable , DataColumn , and DataRow objects for you. 我相信您可以简单地使用ADO.NET DataSet类的ReadXml方法来读取该格式的XML文档,它将为您创建DataTableDataColumnDataRow对象。 You'll need to write a little conversion method if you want to subsequently turn the DATE column's data type to DateTime . 如果要随后将DATE列的数据类型转换为DateTime则需要编写一些转换方法。 But other than that, you shouldn't have to screw around with XML at all. 但是除此之外,您根本不必使用XML。

Edit 编辑

I see from Daniel Bruckner's post that the LOCATION elements in the odd namespace pose a problem. 我从Daniel Bruckner的帖子中看到,奇数名称空间中的LOCATION元素带来了问题。 Well, that's easy enough to fix: 好了,很容易解决:

    XmlDocument d = new XmlDocument();
    d.LoadXml(xml);

    XmlNamespaceManager ns = new XmlNamespaceManager(d.NameTable);
    ns.AddNamespace("n", "http://www.w3.org/XML/1998/namespace");
    foreach (XmlNode n in d.SelectNodes("/root/Table/n:LOCATION", ns))
    {
        XmlElement loc = d.CreateElement("LOCATION");
        n.ParentNode.AppendChild(loc);
        n.ParentNode.RemoveChild(n);
    }

    DataSet ds = new DataSet();
    using (StringReader sr = new StringReader(d.OuterXml))
    {
        ds.ReadXml(sr);
    }

I'm not a huge fan of xml myself, I need to use it as the datasource of a grid to visualize it. 我自己不是xml的忠实拥护者,我需要将其用作网格的数据源以使其可视化。 I get some output from our FileNet imaging server in xml format and I need to get pieces of it out to populate a database. 我从FileNet映像服务器获得了xml格式的一些输出,并且需要将其中的一部分取出来填充数据库。 Here's what I'm doing, HTH: 我正在做的是HTH:

  Dim dsXML As DataSet
  Dim drXML As DataRow
  Dim rdr As System.IO.StringReader
  Dim docs() As String
  Dim SQL As String
  Dim xml As String
  Dim fnID As String

docs = _fnP8Dev.getDocumentsXML(_credToken, _docObjectStoreName, _docClass, "ReferenceNumber=" & fnID, "")
xml = docs(0)
If (InStr(xml, "<z:row") > 0) Then
 RaiseEvent msg("Inserting images for reference number " & fnID)
 rdr = New System.IO.StringReader(xml)
 dsXML = New DataSet
 dsXML.ReadXml(rdr)

 For Each drXML In dsXML.Tables(dsXML.Tables.Count - 1).Rows
   SQL = "Insert into fnImageP8 values ("
   SQL = SQL & "'" & drXML("Id") & "', "
   Try
    SQL = SQL & "'" & drXML("DocumentTitle") & "', "
   Catch ex As Exception
    SQL = SQL & "null, "
   End Try

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM