简体   繁体   English

在 C# 中使用 XmlReader 读取 Xml

[英]Reading Xml with XmlReader in C#

I'm trying to read the following Xml document as fast as I can and let additional classes manage the reading of each sub block.我正在尝试尽可能快地阅读以下 Xml 文档,并让其他类管理每个子块的阅读。

<ApplicationPool>
    <Accounts>
        <Account>
            <NameOfKin></NameOfKin>
            <StatementsAvailable>
                <Statement></Statement>
            </StatementsAvailable>
        </Account>
    </Accounts>
</ApplicationPool>

However, I'm trying to use the XmlReader object to read each Account and subsequently the "StatementsAvailable".但是,我正在尝试使用 XmlReader 对象来读取每个帐户以及随后的“StatementsAvailable”。 Do you suggest using XmlReader.Read and check each element and handle it?您是否建议使用 XmlReader.Read 并检查每个元素并处理它?

I've thought of seperating my classes to handle each node properly.我想过分开我的类来正确处理每个节点。 So theres an AccountBase class that accepts a XmlReader instance that reads the NameOfKin and several other properties about the account.所以有一个 AccountBase 类,它接受一个 XmlReader 实例,该实例读取 NameOfKin 和有关该帐户的其他几个属性。 Then I was wanting to interate through the Statements and let another class fill itself out about the Statement (and subsequently add it to an IList).然后我想通过 Statements 进行交互,并让另一个类填写有关 Statement 的信息(然后将其添加到 IList)。

Thus far I have the "per class" part done by doing XmlReader.ReadElementString() but I can't workout how to tell the pointer to move to the StatementsAvailable element and let me iterate through them and let another class read each of those proeprties.到目前为止,我通过执行 XmlReader.ReadElementString() 完成了“每类”部分,但我无法锻炼如何告诉指针移动到 StatementsAvailable 元素,让我遍历它们并让另一个类读取每个属性.

Sounds easy!听起来很简单!

My experience of XmlReader is that it's very easy to accidentally read too much.我对XmlReader体验是很容易不小心读多了。 I know you've said you want to read it as quickly as possible, but have you tried using a DOM model instead?我知道你说过你想尽快阅读它,但你是否尝试过使用 DOM 模型? I've found that LINQ to XML makes XML work much much easier.我发现 LINQ to XML 使 XML 的工作变得更加容易。

If your document is particularly huge, you can combine XmlReader and LINQ to XML by creating an XElement from an XmlReader for each of your "outer" elements in a streaming manner: this lets you do most of the conversion work in LINQ to XML, but still only need a small portion of the document in memory at any one time.如果您的文档特别大,您可以通过以流式方式从XmlReader为每个“外部”元素创建XElement来组合XmlReader和 LINQ to XML:这使您可以在 LINQ to XML 中完成大部分转换工作,但是在任何时候仍然只需要内存中的一小部分文档。 Here's some sample code (adapted slightly from this blog post ):下面是一些示例代码(稍微改编自这篇博文):

static IEnumerable<XElement> SimpleStreamAxis(string inputUrl,
                                              string elementName)
{
  using (XmlReader reader = XmlReader.Create(inputUrl))
  {
    reader.MoveToContent();
    while (reader.Read())
    {
      if (reader.NodeType == XmlNodeType.Element)
      {
        if (reader.Name == elementName)
        {
          XElement el = XNode.ReadFrom(reader) as XElement;
          if (el != null)
          {
            yield return el;
          }
        }
      }
    }
  }
}

I've used this to convert the StackOverflow user data (which is enormous) into another format before - it works very well.我以前用它来将 StackOverflow 用户数据(这是巨大的)转换为另一种格式 - 它工作得很好。

EDIT from radarbob, reformatted by Jon - although it's not quite clear which "read too far" problem is being referred to...来自radarbob的编辑,由乔恩重新格式化 - 虽然不太清楚哪个“读得太远”问题正在被提及......

This should simplify the nesting and take care of the "a read too far" problem.这应该会简化嵌套并解决“读得太远”的问题。

using (XmlReader reader = XmlReader.Create(inputUrl))
{
    reader.ReadStartElement("theRootElement");

    while (reader.Name == "TheNodeIWant")
    {
        XElement el = (XElement) XNode.ReadFrom(reader);
    }

    reader.ReadEndElement();
}

This takes care of "a read too far" problem because it implements the classic while loop pattern:这解决了“读得太远”的问题,因为它实现了经典的 while 循环模式:

initial read;
(while "we're not at the end") {
    do stuff;
    read;
}

Three years later, perhaps with the renewed emphasis on WebApi and xml data, I came across this question.三年后,也许随着重新强调 WebApi 和 xml 数据,我遇到了这个问题。 Since codewise I am inclined to follow Skeet out of an airplane without a parachute, and seeing his initial code doubly corraborated by the MS Xml team article as well as an example in BOL Streaming Transform of Large Xml Docs , I very quickly overlooked the other comments, most specifically from 'pbz', who pointed out that if you have the same elements by name in succession, every other one is skipped because of the double read.由于代码方面,我倾向于在没有降落伞的情况下跟随 Skeet 离开飞机,并且看到他的初始代码得到了 MS Xml 团队文章以及 BOL Streaming Transform of Large Xml Docs 中的一个示例的双重证实,我很快就忽略了其他评论,最特别的是来自'pbz',他指出如果连续按名称具有相同的元素,由于双重读取,其他元素都会被跳过。 And in fact, the BOL and MS blog articles both were parsing source documents with target elements nested deeper than second level, masking this side-effect.事实上,BOL 和 MS 博客文章都在解析目标元素嵌套比第二级更深的源文档,从而掩盖了这种副作用。

The other answers address this problem.其他答案解决了这个问题。 I just wanted to offer a slightly simpler revision that seems to work well so far, and takes into account that the xml might come from different sources, not just a uri, and so the extension works on the user managed XmlReader.我只是想提供一个稍微简单的修订版,到目前为止似乎运行良好,并考虑到 xml 可能来自不同的来源,而不仅仅是 uri,因此扩展程序适用于用户管理的 XmlReader。 The one assumption is that the reader is in its initial state, since otherwise the first 'Read()' might advance past a desired node:一个假设是阅读器处于其初始状态,否则第一个“Read()”可能会超过所需的节点:

public static IEnumerable<XElement> ElementsNamed(this XmlReader reader, string elementName)
{
    reader.MoveToContent(); // will not advance reader if already on a content node; if successful, ReadState is Interactive
    reader.Read();          // this is needed, even with MoveToContent and ReadState.Interactive
    while(!reader.EOF && reader.ReadState == ReadState.Interactive)
    {
        // corrected for bug noted by Wes below...
        if(reader.NodeType == XmlNodeType.Element && reader.Name.Equals(elementName))
        {
             // this advances the reader...so it's either XNode.ReadFrom() or reader.Read(), but not both
             var matchedElement = XNode.ReadFrom(reader) as XElement;
             if(matchedElement != null)
                 yield return matchedElement;
        }
        else
            reader.Read();
    }
}

We do this kind of XML parsing all the time.我们一直在做这种 XML 解析。 The key is defining where the parsing method will leave the reader on exit.关键是定义解析方法将在退出时离开阅读器的位置。 If you always leave the reader on the next element following the element that was first read then you can safely and predictably read in the XML stream.如果您始终将读取器留在第一次读取的元素之后的下一个元素上,那么您可以安全且可预测地读取 XML 流。 So if the reader is currently indexing the <Account> element, after parsing the reader will index the </Accounts> closing tag.因此,如果阅读器当前正在索引<Account>元素,则在解析后阅读器将索引</Accounts>结束标记。

The parsing code looks something like this:解析代码如下所示:

public class Account
{
    string _accountId;
    string _nameOfKin;
    Statements _statmentsAvailable;

    public void ReadFromXml( XmlReader reader )
    {
        reader.MoveToContent();

        // Read node attributes
        _accountId = reader.GetAttribute( "accountId" );
        ...

        if( reader.IsEmptyElement ) { reader.Read(); return; }

        reader.Read();
        while( ! reader.EOF )
        {
            if( reader.IsStartElement() )
            {
                switch( reader.Name )
                {
                    // Read element for a property of this class
                    case "NameOfKin":
                        _nameOfKin = reader.ReadElementContentAsString();
                        break;

                    // Starting sub-list
                case "StatementsAvailable":
                    _statementsAvailable = new Statements();
                    _statementsAvailable.Read( reader );
                    break;

                    default:
                        reader.Skip();
                }
            }
            else
            {
                reader.Read();
                break;
            }
        }       
    }
}

The Statements class just reads in the <StatementsAvailable> node Statements类只读取<StatementsAvailable>节点

public class Statements
{
    List<Statement> _statements = new List<Statement>();

    public void ReadFromXml( XmlReader reader )
    {
        reader.MoveToContent();
        if( reader.IsEmptyElement ) { reader.Read(); return; }

        reader.Read();
        while( ! reader.EOF )
        {
            if( reader.IsStartElement() )
            {
                if( reader.Name == "Statement" )
                {
                    var statement = new Statement();
                    statement.ReadFromXml( reader );
                    _statements.Add( statement );               
                }
                else
                {
                    reader.Skip();
                }
            }
            else
            {
                reader.Read();
                break;
            }
        }
    }
}

The Statement class would look very much the same Statement类看起来非常相似

public class Statement
{
    string _satementId;

    public void ReadFromXml( XmlReader reader )
    {
        reader.MoveToContent();

        // Read noe attributes
        _statementId = reader.GetAttribute( "statementId" );
        ...

        if( reader.IsEmptyElement ) { reader.Read(); return; }

        reader.Read();
        while( ! reader.EOF )
        {           
            ....same basic loop
        }       
    }
}

For sub-objects, ReadSubtree() gives you an xml-reader limited to the sub-objects, but I really think that you are doing this the hard way.对于子对象, ReadSubtree()为您提供了一个仅限于子对象的 xml 阅读器,但我真的认为您这样做很困难。 Unless you have very specific requirements for handling unusual / unpredicatable xml, use XmlSerializer (perhaps coupled with sgen.exe if you really want).除非您对处理异常/不可预测的 xml 有非常具体的要求,否则请使用XmlSerializer (如果您真的需要,可以与sgen.exe结合使用)。

XmlReader is... tricky. XmlReader是...棘手。 Contrast to:相比较:

using System;
using System.Collections.Generic;
using System.Xml.Serialization;
public class ApplicationPool {
    private readonly List<Account> accounts = new List<Account>();
    public List<Account> Accounts {get{return accounts;}}
}
public class Account {
    public string NameOfKin {get;set;}
    private readonly List<Statement> statements = new List<Statement>();
    public List<Statement> StatementsAvailable {get{return statements;}}
}
public class Statement {}
static class Program {
    static void Main() {
        XmlSerializer ser = new XmlSerializer(typeof(ApplicationPool));
        ser.Serialize(Console.Out, new ApplicationPool {
            Accounts = { new Account { NameOfKin = "Fred",
                StatementsAvailable = { new Statement {}, new Statement {}}}}
        });
    }
}

The following example navigates through the stream to determine the current node type, and then uses XmlWriter to output the XmlReader content.下面的示例在流中导航以确定当前节点类型,然后使用 XmlWriter 输出 XmlReader 内容。

    StringBuilder output = new StringBuilder();

    String xmlString =
            @"<?xml version='1.0'?>
            <!-- This is a sample XML document -->
            <Items>
              <Item>test with a child element <more/> stuff</Item>
            </Items>";
    // Create an XmlReader
    using (XmlReader reader = XmlReader.Create(new StringReader(xmlString)))
    {
        XmlWriterSettings ws = new XmlWriterSettings();
        ws.Indent = true;
        using (XmlWriter writer = XmlWriter.Create(output, ws))
        {

            // Parse the file and display each of the nodes.
            while (reader.Read())
            {
                switch (reader.NodeType)
                {
                    case XmlNodeType.Element:
                        writer.WriteStartElement(reader.Name);
                        break;
                    case XmlNodeType.Text:
                        writer.WriteString(reader.Value);
                        break;
                    case XmlNodeType.XmlDeclaration:
                    case XmlNodeType.ProcessingInstruction:
                        writer.WriteProcessingInstruction(reader.Name, reader.Value);
                        break;
                    case XmlNodeType.Comment:
                        writer.WriteComment(reader.Value);
                        break;
                    case XmlNodeType.EndElement:
                        writer.WriteFullEndElement();
                        break;
                }
            }

        }
    }
    OutputTextBlock.Text = output.ToString();

The following example uses the XmlReader methods to read the content of elements and attributes.下面的示例使用 XmlReader 方法读取元素和属性的内容。

StringBuilder output = new StringBuilder();

String xmlString =
    @"<bookstore>
        <book genre='autobiography' publicationdate='1981-03-22' ISBN='1-861003-11-0'>
            <title>The Autobiography of Benjamin Franklin</title>
            <author>
                <first-name>Benjamin</first-name>
                <last-name>Franklin</last-name>
            </author>
            <price>8.99</price>
        </book>
    </bookstore>";

// Create an XmlReader
using (XmlReader reader = XmlReader.Create(new StringReader(xmlString)))
{
    reader.ReadToFollowing("book");
    reader.MoveToFirstAttribute();
    string genre = reader.Value;
    output.AppendLine("The genre value: " + genre);

    reader.ReadToFollowing("title");
    output.AppendLine("Content of the title element: " + reader.ReadElementContentAsString());
}

OutputTextBlock.Text = output.ToString();
    XmlDataDocument xmldoc = new XmlDataDocument();
    XmlNodeList xmlnode ;
    int i = 0;
    string str = null;
    FileStream fs = new FileStream("product.xml", FileMode.Open, FileAccess.Read);
    xmldoc.Load(fs);
    xmlnode = xmldoc.GetElementsByTagName("Product");

You can loop through xmlnode and get the data...... C# XML Reader可以通过xmlnode循环获取数据...... C# XML Reader

I am not experiented .But i think XmlReader is unnecessary.我没有经验。但我认为 XmlReader 是不必要的。 It is very hard to use.使用起来非常困难。
XElement is very easy to use. XElement 非常易于使用。
If you need performance ( faster ) you must change file format and use StreamReader and StreamWriter classes.如果您需要性能(更快),您必须更改文件格式并使用 StreamReader 和 StreamWriter 类。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM