简体   繁体   中英

XML parsing when node may or may not exist

Please consider the following sample XML. I have a class say Dummy containing two fields X and Y. Through XML parsing I would like build a list of objects of that class. X and Y take value of X and Y tags respectively which are only inside the B parents node. In the XML, the X and Y nodes may or may not exist inside the B node.

 <DOC>
  <A>1</A>
  <B>
   <C>1</C>
   <D>1</D>
   <E>1</E>
   <X>Hello</X>
   <F>1</F>
   <G>1</G>
   <Y>Hi</Y>
  </B>
  <B>
   <C>1</C>
   <D>1</D>
   <E>1</E>
   <F>1</F>
   <G>1</G>
  </B>
 <H>
  <X>1</X> //ignore
  <Y>1</Y> //ignore
 </H>
<DOC>

For the above XML, I would like to have the list containing two elements. First set will have "Hi", "Hello" and the other set will have "", ""

My C# parsing code looks something like this

List<Dummy> dummyList = new List<Dummy>();
Dummy d = null;

 while (xmlReader.Read())
        {
            if (xmlReader.IsStartElement())
            {
                switch (xmlReader.Name)
                {
                   case "B":
                            d = new Dummy();
                            while(xmlReader.Name != "X")
                                xmlReader.Read();   // can go into infinite loop if there is no X node
                            xmlReader.Read();
                            d.X = xmlReader.Value;

                            while(xmlReader.Name != "Y")
                                xmlReader.Read();  // can go into infinite loop if there is no Y node
                            xmlReader.Read();
                            d.Y = xmlReader.Value;

                            dummyList.Add(d);
                            d = null;
                            break;
              }
       }
 }

Above code works fine for the first B node but fails not the second B node. Please let me know your thoughts.

What do you mean that it fails for second B node.

As I see in second B node you don't have any X or Y. It means then when it gets inside switch and case "B" you start while loop until it finds X which does not exist any more. So it reads till end and nothing happens. You have to read ONLY till end of B node (that will secure you of infinite loop) and if there is no X and Y you have to make manual empty strings for them.

bool waitingForXy = false;
while (xmlReader.Read())
{
    if (xmlReader.IsStartElement())
    {
        switch (xmlReader.Name)
        {
            case "B":
                d = new Dummy();
                waitingForXy = true;
                break;
            case "X":
                if (waitingForXy)
                {
                    d.X = xmlReader.ReadString();
                }
                break;
            case "Y":
                if (waitingForXy)
                {
                    d.Y = xmlReader.ReadString();
                }
                break;
        }
    }
    else if (xmlReader.NodeType == XmlNodeType.EndElement)
    {
        switch (xmlReader.Name)
        {
            case "B":
                waitingForXy = false;
                dummyList.Add(d);
                break;
        }
    }
}

This will create a dummy instance on every <B> start element and wait for <X> and <Y> to appear until </B> end element. If they don't occurr, dX and dY will remain null.

If you are using VS2013 SP2 or greater, you can use Edit->Paste Special->Paste XML As Classes to copy and paste the strongly typed classes into your code from your original XML. Note you will have to close your final </DOC> and remove the comments for this to work!

You can then use the following code to extract any X and Y values from any Bs in the XML into a collection of tuples:

string xml = // TODO: Get XML as string.
var myXml = (DOC)new XmlSerializer(typeof(DOC)).Deserialize(new StringReader(xml));
var results = myXml.B.Select(x => Tuple.Create(x.X, x.Y));

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM