简体   繁体   中英

Extracting inner value on CDATA with Linq to XML using a filter

I am using this code to retrieve the values I want from XML:

IEnumerable<ForewordReview> reviews = null;
try
{
    reviews = from item in xmlDoc.Descendants("node")
              select new ForewordReview()
              {
                  PubDate = item.Element("created").ToString(),
                  Isbn = item.Element("isbn").ToString(),
                  Summary = item.Element("review").ToString()
              };
} // ...

Incidentally, a client is now passing us almost every tag with CDATA which I need to extract:

<review>
    <node>
        <created>
            <![CDATA[2012-01-23 12:40:57]]>
        </created>
        <isbn>
            <![CDATA[123456789]]>
        </isbn>
        <summary>
            <![CDATA[Teh Kittehs like to play in teh mud]]>
        </summary>
    </node>
</review>

I have seen a couple of solutions for extracting these values from within the CDATA tag, one of which is to use a where clause on the LINQ statement:

where element.NodeType == System.Xml.XmlNodeType.CDATA

I sort of see whats going on here, but I am not sure this works with how I am using Linq (specifically, building an object from selected items.

Do I need to apply this filter on the items in the select statement individually? Otherwise, I dont really understand how this will work with the code I am using.

As always, I appreciate the help.

Cast each XElement to a string instead :

reviews = from item in xmlDoc.Descendants("node")
          select new 
          {
              PubDate = (string)item.Element("created"),
              Isbn = (string)item.Element("isbn"),
              Summary = (string)item.Element("summary")
          };
// Output:
// {
//      PubDate = 2012-01-23 12:40:57,
//      Isbn = 123456789,
//      Summary = Teh Kittehs like to play in teh mud
// }

This also works with other data types, such as int , float , DateTime , etc:

reviews = from item in xmlDoc.Descendants("node")
          select new 
          {
              PubDate = (DateTime)item.Element("created")
          };
// Output:
// {
//      PubDate = 1/23/2012 12:40:57
// }

It also works with XAttribute s as well .

Remember that there is no difference between the meaning of:

<a>
 <b>Hello</b>
 <c>&amp; hello again</c>
</a>

and of

<a>
 <b><![CDATA[Hello]]></b>
 <c><![CDATA[& hello again]]></c>
</a>

Since you're calling ToString() and getting the entire content back - opening and closing tags, entity references, etc. still in XML form, then you must be prepared to deal with it in XML form. If not, the problem isn't with the code you show here, it's with the code that was okay with PubDate being "<created>2012-01-23 12:40:57</created>" and now isn't okay with it being the exactly equivalent "";

Either change that code to really parse the XML (for which the framework offers lots of things to help), or change it to take the date on its own and use Element("created").Value to retrieve it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM