简体   繁体   中英

Parsing XML in C# XML for specific Content

I am trying to parse an XML response from a website in C#. The response comes in a format similar to the following:

<Company>
    <Owner>Bob</Owner>
    <Contact>
        <address> -1 Infinite Loop </address>
        <phone>
            <LandLine>(000) 555-5555</LandLine>
            <Fax> (000) 555-5556 </Fax>
        </phone>
        <email> foo@bar.com </email>
    </Contact>
</Company>

The only information I want is the LandLine and Fax numbers. However my current approach seems really really poor quality. Essentially it is a bunch of nested while loops and checks to the Element name then reading the Content when I found the right Element. I am using something like the listing below:

XmlReader xml = XmlReader.Create(websiteResultStream, xmlSettings);

while(xml.Read()){
    if(xml.NodeType == XmlNodeType.Element){
        if(xml.Name.ToString() == "Phone"){
            while(xml.Read()) {
                if(xml.NodeType == XmlNodeType.Element) {
                     if(xml.Name.ToString() == "LandLine"){
                          xml.MoveToContent();
                          xml.ReadContentAsString();
                     }
                     if(xml.Name.ToString() == "Fax"){
                          xml.MoveToContent();
                          xml.ReadContentAsString();
                     }
                }
            }
        }
    }
}

I am newer to XML/C#, but the above method just screams bad code! I want to ensure that if the structure changes (ie there are addition phone number types like "mobile") that the code is robust (hence the additional while loops)

Note: the above C# code is not exact, and lacks some checks etc, but it demonstrates my current abysmal disgusting approach

What is the best/cleanest way to simply extract the content from those two Elements if they are present?

Use LINQ-to-XML :

var doc = XDocument.Parse(@"<Company>
    <Owner>Bob</Owner>
    <Contact>
        <address> -1 Infinite Loop </address>
        <phone>
            <LandLine>(000) 555-5555</LandLine>
            <Fax> (000) 555-5556 </Fax>
        </phone>
        <email> foo@bar.com </email>
    </Contact>
</Company>");

var phone = doc.Root.Element("Contact").Element("phone");

Console.WriteLine((string)phone.Element("LandLine"));
Console.WriteLine((string)phone.Element("Fax"));

Output:

(000) 555-5555
 (000) 555-5556

The most light-weight approach for read-only access to specific nodes in an XML document is by using an XPathDocument together with an XPath expression:

XPathDocument xdoc = new XPathDocument(@"C:\sample\document.xml");
XPathNavigator node = xdoc.CreateNavigator()
    .SelectSingleNode("/Company/Contact/phone/LandLine");
if (node != null)
{
    string landline = node.Value;
}

I don't think you're too far off. There are more convenient methods (lots of different approaches). Assuming you want to take the same basic approach as you do here (and it is an efficient if verbose one), I'd do:

bool inPhone = false;
string landLine = null;
string fax = null;

using(xml = XmlReader.Create(websiteResultStream, xmlSettings)
while(xml.Read())
{
  switch(xml.NodeType)
  {
    case XmlNodeType.Element:
      switch(xml.LocalName)
      {
        case "phone":
          inPhone = true;
          break;
        case "LandLine":
          if(inPhone)
          {
            landLine = xml.ReadElementContentAsString();
            if(fax != null)
            {
              DoWhatWeWantToDoWithTheseValues(landline, fax);
              return;
            }
          }
          break;
        case "Fax":
          if(inPhone)
          {
            fax = xml.ReadElementContentAsString();
            if(landLine != null)
            {
              DoWhatWeWantToDoWithTheseValues(landline, fax);
              return;
            }
          }
          break;
      }
      break;
    case XmlNodeType.EndElement:
      if(xml.LocalName == "phone")
        inPhone = false;
      break;
  }
}

Note that this tracks whether it's "inside" a Phone element where that which you have would re-examine a LandLine inside a later element, which you seem to be trying to avoid.

Note also that we clean up the XmlReader, and do so by returning as soon as we have all the information we want.

The best way to do that is to use XPath. Refer to this article, for reference: http://support.microsoft.com/kb/308333

and this article for how to do it: http://www.codeproject.com/KB/cpp/myXPath.aspx

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM